Skip to content

5. Power calculations and ANOVA

7.4 Power calculations for a difference of means 1

7.5 Comparing many means with ANOVA 1

Descriptive Statistics & ANOVA 2

Power calculations for a difference of two means 3

  • In clinical trials, researchers often want to know how many participants they need to recruit to detect a difference between two groups.
  • Because of the sensitivity of the test, the sample size is crucial.
  • Collecting data is expensive and time-consuming, and there is some risk of harm to participants.
  • We should be at least 80% confident that we will detect a difference if one exists.
  • Power is the probability of correctly rejecting the null hypothesis when the alternative hypothesis is true. It is 1-β, where β is the probability of a Type II error.
  • If type 2 error is low, power is high.
  • truth
  • We want to keep both α and β low, but reducing one increases the other, so the solution is to increase the sample size.
  • We can find the standard error of the difference between two means by using the formula SE = sqrt((s1^2/n1) + (s2^2/n2)).
  • Then we use normal distribution to find the critical value, mean = 0, and standard deviation = SE.
  • To find the sample size for 80% power, we use the formula n = (Zα/2 + Zβ)^2 * (s1^2 + s2^2) / (μ1 - μ2)^2.
  • finding the sample size for 80 power

ANOVA introduction 4

  • ANOVA stands for Analysis of Variance.
  • Total variability in the data can be broken down into two parts: variability between groups and variability within groups.
    • Between-group variability: Variability in the data that is due to the different groups (variability between different groups).
    • Within-group variability: Variability in the data that is due to the differences within each group (variability between different observations within the same group).
  • Anova output table:
Df Sum Sq Mean Sq F value Pr(>F)
Group Between group variable (Class) DFG SSG MSG F P-value
Error within group variability (Residual) DFE SSE MSE
Total DFT SST
  • SST:
    • Sum of squares total.
    • Captures the total variability in the data.
    • Similar to computing the variance of the data, but it is not scaled by the sample size.
    • \(SST = \sum_{i=1}^{n} (y_{i} - \bar{y})^2\).
    • \(y\) value of the variable for each observation.
    • \(\bar{y}\) is the overall mean of all observations.
  • SSG:
    • Sum of squares groups.
    • Captures the variability between groups, that is, the variability in the data that is due to variable in question.
    • The explained variability in the data, the variability that is explained by the variable in question.
    • Calculated as the squared deviation of group means from the overall mean, weighted by the number of observations in each group.
    • It is not interesting in its own, but it is useful when compared to SST.
    • The percentage (SSG/SST) is the proportion of variability that is attributed to the variable in question, while 1-(SSG/SST) is the proportion of variability that is attributed to all other sources.
    • \(SSG = \sum_{j=1}^{k} n_{j} (\bar{y}_{j} - \bar{y})^2\).
    • \(n_{j}\) is the number of observations in group \(j\).
    • \(\bar{y}_{j}\) is the mean of group \(j\).
    • \(\bar{y}\) is the overall mean.
  • SSE:
    • Sum of squares error.
    • Captures the variability within groups, that is, the variability in the data that is due to all other variables but the variable in question.
    • The unexplained variability in the data, the variability that is not explained by the variable in question.
    • Calculated as the squared deviation of each observation from its group mean.
    • $SSE = SST - SSG $.
  • DFT:
    • Degrees of freedom total.
    • The number of observations minus one.
    • \(DFT = n - 1\).
    • \(n\) is the sample size.
  • DFG:
    • Degrees of freedom groups.
    • The number of groups minus one.
    • \(DFG = k - 1\).
    • \(k\) is the number of groups within the variable in question.
  • DFE:
    • Degrees of freedom error.
    • The number of observations minus the number of groups.
    • \(DFE = DFT - DFG = n - k\).
  • MSG:
    • Mean squares groups.
    • The average variability between groups.
    • \(MSG = SSG / DFG\).
  • MSE:
    • Mean squares error.
    • The average variability within groups.
    • \(MSE = SSE / DFE\).
  • F:
    • F statistic.
    • The ratio of the variability between groups to the variability within groups.
    • \(FT = MSG / MSE\).
  • Pr(>F):
    • P-value.
    • The probability of observing the data if the null hypothesis is true.
    • If the p-value is less than the significance level, we reject the null hypothesis.
    • If the p-value is greater than the significance level, we fail to reject the null hypothesis.
    • It is the probability of at least as large a ratio between the “between” and “within” group variabilities if in fact, the means of the groups are the equal (null hypothesis is true).
    • It is the area to the right of the observed F value under the F distribution with DFG and DFE degrees.
    • F-distribution is a right-skewed distribution, and always positive.

Conditions for ANOVA 5

  • Independence:
    • Observations within each group are independent.
      • Random sample / random assignment.
      • Each group size is less than 10% of the population size.
      • It is important, but it is difficult to check.
    • Observations between groups are independent:
      • Groups are independent of each other (not paired).
      • If data is paired, use the repeated measures ANOVA.
  • Approximate normality:
    • The distribution of the response variable is approximately normal within each group.
    • It is especially important when the sample size is small.
  • Equal variance:
    • The variability of the response variable is approximately the same within each group.
    • Variability should be consistent across groups, that is, homoscedastic groups.
    • It is important if the sample size differs between groups.

JASP ANOVA 6

  • Load the data from the file.
  • Make sure the variable in question is of type ratio scale.
  • Choose Descriptive Statistics from the Analysis menu.
  • Select the variable in question.
  • Select the following:
    • Display frequency tables.
    • Display the median.
  • Select ANOVA from the Analysis menu.
    • Under additional options, select:
      • Descriptive statistics.
      • Effect size (partial eta \(\eta^2\) squared).
  • In Fixed Factors, select two or more variables of nominal scale, they will be used as predictor variables.
  • Select the variable in question as the dependent variable.

References


  1. Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2019). Openintro statistics - Fourth edition. Open Textbook Library. https://www.biostat.jhsph.edu/~iruczins/teaching/books/2019.openintro.statistics.pdf Read Chapter 7 - Inference for numerical data Section7.4 - Power calculations for a difference of means from page 278 to page 284 Section 7.5 - Comparing many means with ANOVA from page 285 to page 302 

  2. Goss-Sampson, M. A. (2022). Statistical analysis in JASP: A guide for students (5th ed., JASP v0.16.1 2022). https://jasp-stats.org/wp-content/uploads/2022/04/Statistical-Analysis-in-JASP-A-Students-Guide-v16.pdf licensed under CC BY 4.0 Read Descriptive Statistics (pp. 14- pp. 27) Read ANOVA (pp. 91- pp. 98) 

  3. Çetinkaya-Rundel, M. (2018a, February 20). 5 4 Power calculations for difference of two means [Video]. YouTube. https://youtu.be/vnjjhQDedvs 

  4. Çetinkaya-Rundel, M. (2018b, February 20). 5 5A ANOVA introduction [Video]. YouTube. https://youtu.be/W36DMVJ4Ibo 

  5. Çetinkaya-Rundel, M. (2018c, February 20). 5 5B Conditions for ANOVA [Video]. YouTube. https://youtu.be/HGFiWMA5OC8 

  6. HeadlessProfessor. (2016, November 12). JASP ANOVA [Video]. YouTube. https://youtu.be/nlAhWQmG5Iw