Skip to content

7. Linear Regression Inference and multiple regression

CH:8 Introduction to linear regression 1

Multiple Regression 2

  • Predictor variables are correlated to some extent and thus, the order in which the predictors are entered can make a difference.
  • Enter: This is the default method in which all the predictors are forced into the model in the order they appear in the Covariates box. This is considered to be the best method.
  • R2 provides information on how much variance is explained by the model using the predictors provided.
  • F-statistic provides information as to how good the model is.
  • The unstandardized (b)-value provides a constant which reflects the strength of the relationship between the predictor(s) and the outcome variable.
  • Violation of assumptions can be checked using Durbin-Watson value, tolerance/VIF values, Residual vs predicted and Q-Q plots.

Inference for linear regression 3

  • The null hypothesis for the slope coefficient is that the true slope is zero. The alternative hypothesis is that the true slope is not zero.
  • We follow the T-distribution to calculate the p-value for the slope coefficient.
  • If it is a one-tailed test, we divide the p-value by 2.
  • If the fit conditions are not met, we cannot trust the p-value.

Introduction to multiple regression 4

  • One response variable and multiple explanatory variables (predictors).
\[ \hat{y} = b_0 + b_1x_1 + b_2x_2 + ... + b_kx_k \]
  • Where \(b_0\) is the intercept and \(b_1, b_2, ..., b_k\) are the coefficients of the explanatory variables.
  • The b1, b2, ..., bk are the slopes of the explanatory variables; they may be different from the slope if only one explanatory variable is considered.
  • The regular R2 explains the proportion of the variability in the response variable that is explained by the explanatory variables.
  • R2 is biased in case of multiple regression, thus, it is better to use the adjusted R2.
  • The adjusted R2 is calculated as:
\[ \begin{aligned} R^2_{adj} &= 1 - (\frac{Var(e_{i})}{Var(y_{i})} \times \frac{n-1}{n-k-1}) \\ &= 1 - ((\frac{\sum_{i=1}^n{(y_i - \hat{y}_i)^2}}{\sum_{i=1}^n{(y_i - \bar{y}_i)^2}}) \times \frac{n-1}{n-k-1}) \\ &= 1 - ((1 - R^2) \times \frac{n-1}{n-k-1}) \end{aligned} \]
  • Where:
    • \(Var(e_{i})\) is the variance of the residuals.
    • \(Var(y_{i})\) is the variance of the response variable.
    • \(n\) is the number of observations.
    • \(k\) is the number of explanatory variables.
    • \(y_i\) is the observed value.
    • \(\hat{y}_i\) is the predicted value.
    • \(\bar{y}_i\) is the mean of the observed values.
    • \(R^2\) is the regular R2.
    • \(R^2_{adj}\) is the adjusted R2.
  • Adjusted R2 accounts for model complexity and maximize accuracy.

Model selection in multiple regression 5

  • Full model: A model with all the explanatory variables.
  • We want to assess if the full model is better than the reduced model, or if it is the best model.
  • Two strategies are used to include or exclude variables from the model:
    • Forward selection:
      • Start with the null model and add variables one by one.
    • Backward selection:
      • Start with the full model and remove variables one by one.
      • Remove until only variables with significant p-values are left.
      • Drop the variable with the highest p-value, refit the model, and reassess the p-values.
  • If forward and backward selection give different models, we choose the model with the best adjusted R2.

Multiple regression in JASP 6

  • Go to Regression -> Linear regression.
  • Put the response(dependent) variable in the Dependent Variable box.
  • Put the explanatory(independent) variables in the Covariates box.

References


  1. Diez, D., Cetinkaya-Rundel, M., Barr C. D., & Barr, C. D. (2019). OpenIntro statistics - Fourth edition. Open Textbook Library. https://www.biostat.jhsph.edu/~iruczins/teaching/books/2019.openintro.statistics.pdf Read Chapter 8- Introduction to linear regression Section 8.4 - Inference for linear regression from page 331 to 340 Read Chapter 9- Multiple and logistic regression Section 9.1 - Introduction to multiple regression from page 343 to 352 Section 9.2 - Model selection from page 353 to 357 Solve the following practice exercises as homework from the attached: Practice Exercises Unit 7.pdf 

  2. Goss-Sampson, M. A. (2022). Statistical analysis in JASP: A guide for students (5th ed., JASP v0.16.1 2022). https://jasp-stats.org/wp-content/uploads/2022/04/Statistical-Analysis-in-JASP-A-Students-Guide-v16.pdf licensed under CC BY 4.0 Read Multiple Regression from page 79 to 85 

  3. OpenIntroOrg. (2014, January 27). Inference for linear regression [Video]. YouTube. https://youtu.be/depiT-hTaGA 

  4. OpenIntroOrg. (2013, November 14). Introduction to multiple regression [Video]. YouTube. https://youtu.be/sQpAuyfEYZg 

  5. OpenIntroOrg. (2013, November 17). Model selection in multiple regression [Video]. YouTube. https://youtu.be/VB1qSwoF-l0 

  6. Social and Behavioral Sciences at Bethel Univ. (2019, August 20). Multiple regression in JASP [Video]. YouTube. https://youtu.be/7ZQG9DzxLZ8