4. Inference for numerical data¶
7. Inference for numerical data 1¶
T distribution 2¶
- T distribution is useful to plot the distribution of the sample mean when the population standard deviation is unknown.
- It is a bell-shaped distribution that is symmetric around 0.
- It is similar to the normal distribution but has heavier tails, and lower peak.
- Observations are more likely to fall in the tails of the t-distribution than the normal distribution.
- Observations are more likely to fall beyond 2 standard deviations from the mean in the t-distribution than the normal distribution.
- Confidence intervals are wider, aka, more conservative when using the t-distribution than the normal distribution.
- Thick tails means more error for mitigating the uncertainty and the less reliable the estimate for standard error.
- T distribution has a parameter called degrees of freedom which determines the thickness of the tails.
- As the degrees of freedom increase, the t-distribution approaches the normal distribution.
- calculate the t statistic using the formula: \(T = \frac{obs - null}{SE}\).
- P-value is the probability of observing a test statistic as extreme as the one observed, assuming the null hypothesis is true.
Inference for a mean 3¶
- Mean of the population is within the confidence interval.
- Where:
s
is the sample standard deviation.n
is the sample size.t*
is the t-score.df
is the degrees of freedom.SE
is the standard error.x
is the sample mean.
- To find the t* score:
- Calculate the degrees of freedom: \(df = n - 1\).
- Use the t-distribution table to find the t* score:
- Find the row that corresponds to the degrees of freedom.
- Find the column that corresponds to the confidence level.
- The value at the intersection is the t* score.
- Or use the
qt()
function in R to find the t* score:qt((1 - confidence)/2, df = n - 1)
- for 95% confidence level,
qt(0.025, df = n - 1)
- To compute the p-value:
- Calculate the degrees of freedom: \(df = n - 1\).
- Calculate the t statistic using the formula: \(t = \frac{obs - null}{SE}\).
- Use the
pt()
function in R to find the p-value:pt(t, df, lower.tail = FALSE) * 2
- for a two-tailed test, multiply by 2.
- Using the t-distribution table:
- Find the row that corresponds to the degrees of freedom.
- Find the column that corresponds to the t statistic.
- The value at the intersection is the p-value.
Inference for paired data 4¶
- Paired data is when two observations are linked in some way, aka, not independent.
- The difference between the two observations is calculated, and it is used to perform inference.
- The difference is the new data set that is used to calculate the mean and standard deviation.
- If the average difference is 0, then the null hypothesis is true; and there is no difference between the two observations sets.
- H0: \(\mu_{diff} = 0\). Ha: \(\mu_{diff} ≠ 0\).
Difference of two independent means 5¶
So:
- Conditions:
- Independence:
- Within groups: sampled observations are independent.
- Random sample/assignment.
- 10% condition: both sample sizes are less than 10% of the population.
- Between groups: the two groups are independent.
- Groups are not paired.
- Within groups: sampled observations are independent.
- Sample size/skew:
- The more skewed the data, the larger the sample size required.
- Independence:
References¶
-
Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2019). Openintro statistics - Fourth edition. Open Textbook Library. https://www.biostat.jhsph.edu/~iruczins/teaching/books/2019.openintro.statistics.pdf Chapter 7 - Inference for numerical data. Section 7.1 - One Sample means with t-distribution from page 251 to page 261 Section 7.2 - Paired data from page 262 to page 266
Section 7.3 - Difference of two means from page 267 to page 277 ↩ -
Çetinkaya-Rundel, M. (2018a, February 20). 5 1A t distribution [Video]. YouTube. https://youtu.be/uVEj2uBJfq0 ↩
-
Çetinkaya-Rundel, M. (2018b, February 20). 5 1B Inference for a mean [Video]. YouTube. https://youtu.be/RYVIGj1l4xs ↩
-
Çetinkaya-Rundel, M. (2018c, February 20). 5 2 Inference for paired data [Video]. YouTube. https://youtu.be/K0QZ9_4w0HU ↩
-
Çetinkaya-Rundel, M. (2018d, February 20). 5 3 Difference of two independent means [Video]. YouTube. https://youtu.be/emZ24asR2F4 ↩