4. Classification¶
4.1 An Overview of Classification 1¶
- Useful if the prediction target is categorical or qualitative, while regression is useful if the prediction target is numerical or continuous.
- Classification computes the probability of the observation matching each available category, and then predicts the category with the highest probability.
- Widely used classifiers:
- Logistic regression.
- Linear discriminant analysis.
- K-nearest neighbors.
- Other classifiers (more computer-intensive):
- Generalized additive models.
- Trees.
- Random forests.
- Boosting.
- Support vector machines.
- For binary qualitative response:
- We can use the dummy variable (0/1) from linear regression to represent the qualitative response.
- This works because if we flip the coding to (1/0) then the response will be the same as if we used the encoding (0/1) respectively.
- The predictions using linear regression for binary qualitative response will be the same as linear discriminant analysis (LDA).
4.3 Logistic Regression¶
- Logistic regression models the probability that the response Y belongs to a particular category, instead of modeling the response Y directly.
- The logistic function:
- It predicts the probability that Y belongs to a particular category.
- It always outputs a number between 0 and 1.
- It always outputs S-shaped curve.
\[
p(X) = \frac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}}
\]
- The odd:
- Odd is the ratio of the probability of an event occurring to the probability of it not occurring.
- Can take on any value between 0 and infinity.
\[
Odd(X) = \frac{p(X)}{1 - p(X)} = e^{\beta_0 + \beta_1 X}
\]
- The logit function:
- It is the log of the odds.
\[
logit = log(Odd(X)) = log\left(\frac{p(X)}{1 - p(X)}\right) = \beta_0 + \beta_1 X
\]
- The minimum likelihood:
- It is a method for estimating the coefficients in logistic regression.
- The \(\beta_0\) and \(\beta_1\) are chosen to maximize the likelihood function below.
\[
\begin{aligned}
l(\beta_0, \beta_1) &= \prod_{i:y_i=1} p(x_i) \prod_{i':y_{i'}=0} (1 - p(x_{i'})) \\
\end{aligned}
\]
4.4 Linear Discriminant Analysis¶
- It works as the linear regression for qualitative response, however it solves some of the problems of linear regression:
- Linear regression estimates are unstable for well-separated classes.
- If we have more than two response classes, linear regression becomes more complicated.
- When n is small, linear regression is again unstable.
- LDA use the Bayes theorem to estimate the probability of the response class.
4.4.4 Quadratic Discriminant Analysis¶
- Quadratic discriminant analysis (QDA) provides an alternative quadratic approach to LDA,
- The QDA (like LDA) classifier results from assuming that the observations from each class are drawn from a Gaussian distribution, and plugging estimates for the parameters into Bayes’ theorem in order to perform prediction.
- However, unlike LDA, QDA assumes that each class has its own covariance matrix.
4.5 A Comparison of Classification Methods¶
- We compare Logistic Regression, LDA, QDA, and KNN.
- In LDA, \(log(\frac{p_1(x)}{1- p_1(x)}) = \frac{log(p_1(x))}{p_2(x)} = c_0 + c_1x\) where \(c_0\) and \(c_1\) are functions of \(\mu_0\), \(\mu_1\), and \(\sigma^2\).
- In logistic regression, \(log(\frac{p_1(x)}{1- p_1(x)}) = \beta_0 + \beta_1x\) where \(\beta_0\) and \(\beta_1\) are constants.
- The difference is the way that \(m_0\), \(m_1\), and \(\beta_0\), \(beta_1\) are estimated.
- \(\beta_0\) and \(\beta_1\) are estimated using maximum likelihood.
- \(c_0\). \(c_1\) are estimated using the estimated mean and variance of normal distribution.
kNN(k-Nearest Neighbour) Algorithm in R 2¶
- k nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors.
- This algorithms segregates unlabeled data points into well defined groups.
Logistic Regression Tutorial 3¶
- \(Odds = \frac{p}{1-p} = \frac{p(success)}{p(failure)} = \frac{p}{q}\) where \(q = 1 - p\).
\[
\begin{aligned}
odds (success) =
\begin{cases}
< 1 & \text{if } p < q \\
1 & \text{if } p = q \\
> 1 & \text{if } p > q
\end{cases}
\end{aligned}
\]
- The logit function is the logarithm of the odds $ = log_e(odds) = log_e(\frac{p}{q})$.
\[
\begin{aligned}
logit(p) =
\begin{cases}
< 0 & \text{if } p < q \text{ thus } odds < 1 \\
0 & \text{if } p = q \text{ thus } odds = 1 \\
> 0 & \text{if } p > q \text{ thus } odds > 1
\end{cases}
\end{aligned}
\]
References¶
-
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York, NY: Springer. Read Chapter 4 available at https://www.stat.berkeley.edu/users/rabbee/s154/ISLR_First_Printing.pdf ↩
-
Skand, K. (2017, October 8). kNN(k-Nearest Neighbor) algorithm in R. Retrieved from https://rstudio-pubs-static.s3.amazonaws.com/316172_a857ca788d1441f8be1bcd1e31f0e875.html ↩
-
King, W. B. Logistic Regression Tutorial. This tutorial demonstrates the use of Logistic Regression as a classifier algorithm. http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html ↩