Skip to content

4. Classification

4.1 An Overview of Classification 1

  • Useful if the prediction target is categorical or qualitative, while regression is useful if the prediction target is numerical or continuous.
  • Classification computes the probability of the observation matching each available category, and then predicts the category with the highest probability.
  • Widely used classifiers:
    • Logistic regression.
    • Linear discriminant analysis.
    • K-nearest neighbors.
  • Other classifiers (more computer-intensive):
    • Generalized additive models.
    • Trees.
    • Random forests.
    • Boosting.
    • Support vector machines.
  • For binary qualitative response:
    • We can use the dummy variable (0/1) from linear regression to represent the qualitative response.
    • This works because if we flip the coding to (1/0) then the response will be the same as if we used the encoding (0/1) respectively.
    • The predictions using linear regression for binary qualitative response will be the same as linear discriminant analysis (LDA).

4.3 Logistic Regression

  • Logistic regression models the probability that the response Y belongs to a particular category, instead of modeling the response Y directly.
  • The logistic function:
    • It predicts the probability that Y belongs to a particular category.
    • It always outputs a number between 0 and 1.
    • It always outputs S-shaped curve.
\[ p(X) = \frac{e^{\beta_0 + \beta_1 X}}{1 + e^{\beta_0 + \beta_1 X}} \]
  • The odd:
    • Odd is the ratio of the probability of an event occurring to the probability of it not occurring.
    • Can take on any value between 0 and infinity.
\[ Odd(X) = \frac{p(X)}{1 - p(X)} = e^{\beta_0 + \beta_1 X} \]
  • The logit function:
    • It is the log of the odds.
\[ logit = log(Odd(X)) = log\left(\frac{p(X)}{1 - p(X)}\right) = \beta_0 + \beta_1 X \]
  • The minimum likelihood:
    • It is a method for estimating the coefficients in logistic regression.
    • The \(\beta_0\) and \(\beta_1\) are chosen to maximize the likelihood function below.
\[ \begin{aligned} l(\beta_0, \beta_1) &= \prod_{i:y_i=1} p(x_i) \prod_{i':y_{i'}=0} (1 - p(x_{i'})) \\ \end{aligned} \]

4.4 Linear Discriminant Analysis

  • It works as the linear regression for qualitative response, however it solves some of the problems of linear regression:
    • Linear regression estimates are unstable for well-separated classes.
    • If we have more than two response classes, linear regression becomes more complicated.
    • When n is small, linear regression is again unstable.
  • LDA use the Bayes theorem to estimate the probability of the response class.

4.4.4 Quadratic Discriminant Analysis

  • Quadratic discriminant analysis (QDA) provides an alternative quadratic approach to LDA,
  • The QDA (like LDA) classifier results from assuming that the observations from each class are drawn from a Gaussian distribution, and plugging estimates for the parameters into Bayes’ theorem in order to perform prediction.
  • However, unlike LDA, QDA assumes that each class has its own covariance matrix.

4.5 A Comparison of Classification Methods

  • We compare Logistic Regression, LDA, QDA, and KNN.
  • In LDA, \(log(\frac{p_1(x)}{1- p_1(x)}) = \frac{log(p_1(x))}{p_2(x)} = c_0 + c_1x\) where \(c_0\) and \(c_1\) are functions of \(\mu_0\), \(\mu_1\), and \(\sigma^2\).
  • In logistic regression, \(log(\frac{p_1(x)}{1- p_1(x)}) = \beta_0 + \beta_1x\) where \(\beta_0\) and \(\beta_1\) are constants.
  • The difference is the way that \(m_0\), \(m_1\), and \(\beta_0\), \(beta_1\) are estimated.
  • \(\beta_0\) and \(\beta_1\) are estimated using maximum likelihood.
  • \(c_0\). \(c_1\) are estimated using the estimated mean and variance of normal distribution.

kNN(k-Nearest Neighbour) Algorithm in R 2

  • k nearest neighbors is a simple algorithm that stores all available cases and classifies new cases by a majority vote of its k neighbors.
  • This algorithms segregates unlabeled data points into well defined groups.

Logistic Regression Tutorial 3

  • \(Odds = \frac{p}{1-p} = \frac{p(success)}{p(failure)} = \frac{p}{q}\) where \(q = 1 - p\).
\[ \begin{aligned} odds (success) = \begin{cases} < 1 & \text{if } p < q \\ 1 & \text{if } p = q \\ > 1 & \text{if } p > q \end{cases} \end{aligned} \]
  • The logit function is the logarithm of the odds $ = log_e(odds) = log_e(\frac{p}{q})$.
\[ \begin{aligned} logit(p) = \begin{cases} < 0 & \text{if } p < q \text{ thus } odds < 1 \\ 0 & \text{if } p = q \text{ thus } odds = 1 \\ > 0 & \text{if } p > q \text{ thus } odds > 1 \end{cases} \end{aligned} \]

References


  1. James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York, NY: Springer. Read Chapter 4 available at https://www.stat.berkeley.edu/users/rabbee/s154/ISLR_First_Printing.pdf 

  2. Skand, K. (2017, October 8). kNN(k-Nearest Neighbor) algorithm in R. Retrieved from https://rstudio-pubs-static.s3.amazonaws.com/316172_a857ca788d1441f8be1bcd1e31f0e875.html 

  3. King, W. B. Logistic Regression Tutorial. This tutorial demonstrates the use of Logistic Regression as a classifier algorithm. http://ww2.coastal.edu/kingw/statistics/R-tutorials/logistic.html