4. Classification¶

4.1 An Overview of Classification ¹¶

Useful if the prediction target is categorical or qualitative, while regression is useful if the prediction target is numerical or continuous.
Classification computes the probability of the observation matching each available category, and then predicts the category with the highest probability.
Widely used classifiers:
- Logistic regression.
- Linear discriminant analysis.
- K-nearest neighbors.
Other classifiers (more computer-intensive):
- Generalized additive models.
- Trees.
- Random forests.
- Boosting.
- Support vector machines.
For binary qualitative response:
- We can use the dummy variable (0/1) from linear regression to represent the qualitative response.
- This works because if we flip the coding to (1/0) then the response will be the same as if we used the encoding (0/1) respectively.
- The predictions using linear regression for binary qualitative response will be the same as linear discriminant analysis (LDA).

4.3 Logistic Regression¶

Logistic regression models the probability that the response Y belongs to a particular category, instead of modeling the response Y directly.
The logistic function:
- It predicts the probability that Y belongs to a particular category.
- It always outputs a number between 0 and 1.
- It always outputs S-shaped curve.

\[ p(X) = \frac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} \]

The odd:
- Odd is the ratio of the probability of an event occurring to the probability of it not occurring.
- Can take on any value between 0 and infinity.

\[ Odd(X) = \frac{p(X)}{1 - p(X)} = e^{\beta_0 + \beta_1 X} \]

The logit function:
- It is the log of the odds.

\[ logit = log(Odd(X)) = log\left(\frac{p(X)}{1 - p(X)}\right) = \beta_0 + \beta_1 X \]

The minimum likelihood:
- It is a method for estimating the coefficients in logistic regression.
- The $\beta_0$ and $\beta_1$ are chosen to maximize the likelihood function below.

\[ \begin{aligned} l(\beta_0, \beta_1) &= \prod_{i:y_i=1} p(x_i) \prod_{i':y_{i'}=0} (1 - p(x_{i'})) \\ \end{aligned} \]

4.4 Linear Discriminant Analysis¶

It works as the linear regression for qualitative response, however it solves some of the problems of linear regression:
- Linear regression estimates are unstable for well-separated classes.
- If we have more than two response classes, linear regression becomes more complicated.
- When n is small, linear regression is again unstable.
LDA use the Bayes theorem to estimate the probability of the response class.

4.4.4 Quadratic Discriminant Analysis¶

Quadratic discriminant analysis (QDA) provides an alternative quadratic approach to LDA,
The QDA (like LDA) classifier results from assuming that the observations from each class are drawn from a Gaussian distribution, and plugging estimates for the parameters into Bayes’ theorem in order to perform prediction.
However, unlike LDA, QDA assumes that each class has its own covariance matrix.

4.5 A Comparison of Classification Methods¶

We compare Logistic Regression, LDA, QDA, and KNN.
In LDA, $log(\frac{p_1(x)}{1- p_1(x)}) = \frac{log(p_1(x))}{p_2(x)} = c_0 + c_1x$ where $c_0$ and $c_1$ are functions of $\mu_0$, $\mu_1$, and $\sigma^2$.
In logistic regression, $log(\frac{p_1(x)}{1- p_1(x)}) = \beta_0 + \beta_1x$ where $\beta_0$ and $\beta_1$ are constants.
The difference is the way that $m_0$, $m_1$, and $\beta_0$, $beta_1$ are estimated.
$\beta_0$ and $\beta_1$ are estimated using maximum likelihood.
$c_0$. $c_1$ are estimated using the estimated mean and variance of normal distribution.

kNN(k-Nearest Neighbour) Algorithm in R ²¶

k nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors.
This algorithms segregates unlabeled data points into well defined groups.

Logistic Regression Tutorial ³¶

$Odds = \frac{p}{1-p} = \frac{p(success)}{p(failure)} = \frac{p}{q}$ where $q = 1 - p$.

\[ \begin{aligned} odds (success) = \begin{cases} < 1 & \text{if } p < q \\ 1 & \text{if } p = q \\ > 1 & \text{if } p > q \end{cases} \end{aligned} \]

The logit function is the logarithm of the odds $ = log_e(odds) = log_e(\frac{p}{q})$.

\[ \begin{aligned} logit(p) = \begin{cases} < 0 & \text{if } p < q \text{ thus } odds < 1 \\ 0 & \text{if } p = q \text{ thus } odds = 1 \\ > 0 & \text{if } p > q \text{ thus } odds > 1 \end{cases} \end{aligned} \]

References¶

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York, NY: Springer. Read Chapter 4 available at https://www.stat.berkeley.edu/users/rabbee/s154/ISLR_First_Printing.pdf ↩
Skand, K. (2017, October 8). kNN(k-Nearest Neighbor) algorithm in R. Retrieved from https://rstudio-pubs-static.s3.amazonaws.com/316172_a857ca788d1441f8be1bcd1e31f0e875.html ↩
King, W. B. Logistic Regression Tutorial. This tutorial demonstrates the use of Logistic Regression as a classifier algorithm. http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html ↩