Skip to content

4. Inference for numerical data

7. Inference for numerical data 1

T distribution 2

  • T distribution is useful to plot the distribution of the sample mean when the population standard deviation is unknown.
  • It is a bell-shaped distribution that is symmetric around 0.
  • It is similar to the normal distribution but has heavier tails, and lower peak.
  • Observations are more likely to fall in the tails of the t-distribution than the normal distribution.
  • Observations are more likely to fall beyond 2 standard deviations from the mean in the t-distribution than the normal distribution.
  • Confidence intervals are wider, aka, more conservative when using the t-distribution than the normal distribution.
  • Thick tails means more error for mitigating the uncertainty and the less reliable the estimate for standard error.
  • T distribution has a parameter called degrees of freedom which determines the thickness of the tails.
  • As the degrees of freedom increase, the t-distribution approaches the normal distribution.
  • calculate the t statistic using the formula: T=obsnullSE.
  • P-value is the probability of observing a test statistic as extreme as the one observed, assuming the null hypothesis is true.

Inference for a mean 3

  • Mean of the population is within the confidence interval.
point estimate±margin of errorx¯±tdfSEx¯x¯±tdfsnx¯±tn1sn
  • Where:
    • s is the sample standard deviation.
    • n is the sample size.
    • t* is the t-score.
    • df is the degrees of freedom.
    • SE is the standard error.
    • x is the sample mean.
  • To find the t* score:
    • Calculate the degrees of freedom: df=n1.
    • Use the t-distribution table to find the t* score:
      • Find the row that corresponds to the degrees of freedom.
      • Find the column that corresponds to the confidence level.
      • The value at the intersection is the t* score.
    • Or use the qt() function in R to find the t* score:
      • qt((1 - confidence)/2, df = n - 1)
      • for 95% confidence level, qt(0.025, df = n - 1)
  • To compute the p-value:
    • Calculate the degrees of freedom: df=n1.
    • Calculate the t statistic using the formula: t=obsnullSE.
    • Use the pt() function in R to find the p-value:
      • pt(t, df, lower.tail = FALSE) * 2
      • for a two-tailed test, multiply by 2.
    • Using the t-distribution table:
      • Find the row that corresponds to the degrees of freedom.
      • Find the column that corresponds to the t statistic.
      • The value at the intersection is the p-value.

Inference for paired data 4

  • Paired data is when two observations are linked in some way, aka, not independent.
  • The difference between the two observations is calculated, and it is used to perform inference.
  • The difference is the new data set that is used to calculate the mean and standard deviation.
  • If the average difference is 0, then the null hypothesis is true; and there is no difference between the two observations sets.
  • H0: μdiff=0. Ha: μdiff0.

Difference of two independent means 5

point estimate±margin of error(x¯1x¯2)±tdfSEx¯1x¯2(x¯1x¯2)±tdfs12n1+s22n2(x¯1x¯2)±t(min(n11,n22))s12n1+s22n2

So:

SEx¯1x¯2=s12n1+s22n2df=min(n11,n22)
  • Conditions:
    • Independence:
      • Within groups: sampled observations are independent.
        • Random sample/assignment.
        • 10% condition: both sample sizes are less than 10% of the population.
      • Between groups: the two groups are independent.
        • Groups are not paired.
    • Sample size/skew:
      • The more skewed the data, the larger the sample size required.

References


  1. Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2019). Openintro statistics - Fourth edition. Open Textbook Library. https://www.biostat.jhsph.edu/~iruczins/teaching/books/2019.openintro.statistics.pdf Chapter 7 - Inference for numerical data. Section 7.1 - One Sample means with t-distribution from page 251 to page 261 Section 7.2 - Paired data from page 262 to page 266
    Section 7.3 - Difference of two means from page 267 to page 277 

  2. Çetinkaya-Rundel, M. (2018a, February 20). 5 1A t distribution [Video]. YouTube. https://youtu.be/uVEj2uBJfq0 

  3. Çetinkaya-Rundel, M. (2018b, February 20). 5 1B Inference for a mean [Video]. YouTube. https://youtu.be/RYVIGj1l4xs 

  4. Çetinkaya-Rundel, M. (2018c, February 20). 5 2 Inference for paired data [Video]. YouTube. https://youtu.be/K0QZ9_4w0HU 

  5. Çetinkaya-Rundel, M. (2018d, February 20). 5 3 Difference of two independent means [Video]. YouTube. https://youtu.be/emZ24asR2F4