Skip to content

3. Regression


  • Linear Regression:
    • Simple approach for supervised learning.
    • Useful for predicting a quantitative response.
    • Many tools are generalization or extensions of linear regression.
  • Simple Linear Regression:
    • Linear relationship between the response Y and a single predictor X.
    • Denoted by: Y = β0 + β1 X
    • Y = response or dependent variable (the variable that we want to predict).
    • X = predictor or independent variable (we have data on this variable).
    • β0 and β1 = intercept and slope (regression model coefficients or parameters).
    • y ̂ = β0 ̂ + β1 ̂ x is the prediction for Y based on the value of X = x.
    • The \ ̂\ (hat symbol) denotes its prediction.
  • Residual: the difference between an observed value and its predicted value (e = y - y ̂).
  • Residual Sum of Squares (RSS):
    • The sum of the squared residuals (RSS = e1^2 + e2^2 + … + en^2).
    • RSS = (y1 - β0 - β1 x1)^2 + (y2 - β0 - β1 x2)^2 + … + (yn - β0 - β1 xn)^2.
  • Least Squares:
    • It is a method for estimating the unknown parameters (β0, β1) in a linear regression model.
    • It chooses β0 and β1 to minimize the RSS.
    • β̂0 = y ̄ - β1 x ̄ (where x ̄ and y ̄ are the sample means).
    • β̂1 = Σ (xi - x ̄) (yi - y ̄) / Σ (xi - x ̄)^2.
  • Population regression line:
    • Represents true relationship between X and Y is linear as Y = β0 + β1 X + ε (where ε is a random error term).
    • β0 is the intercept: the expected value of Y when X = 0.
    • β1 is the slope: the average increase in Y associated with a one-unit increase in X.
    • ϵ is a catch-all for what we miss with this simple model, such as the effect of other variables on Y, but we assume this error is independent of X.
  • The least squares line does not include the error term ϵ. but the population regression line does.
  • In practice, we do not know the population regression line, so we use the least squares line as an estimate.
  • Bias: the difference between sample mean (for example, or any other statistic) and the population statistic.
  • Unbiased estimator: , an unbiased estimator does not systematically over- or under-estimate the true parameter.
  • Multiple Linear Regression:
    • It is a linear regression that have more than one predictor.
    • Each predictor is given its β0, β1 separately, and the sum of all of their simple lines is the multiple linear regression.
    • Each β1 represents the average effect on Y of a one unit increase in that predictor, holding all other predictors fixed.
  • Potential problems with regression:
    • Non-linearity of the response-predictor relationships: Residual plots can help identify non-linear relationships.
    • Correlation of error terms: if the error terms are correlated, we may have an unwarranted sense of confidence in our model.
    • Non-constant variance of error terms.
    • Outliers.
    • High-leverage points.
    • Collinearity.
  • KNN K-nearest neighbors regression:
    • It is an alternative, but not parametric, approach to linear regression.


  • James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An Introduction to Statistical Learning with Applications in R. New York, NY: Springer. Chapter 3: Linear Regression