Skip to content

JA2. Hypothesis testing Exercises

Part 1

A group of 441 adults who did not have a college degree and were not currently enrolled in school were randomly selected. 38% of them said they did not attend college because they could not afford it.

Question (1.a)

a. Conduct a hypothesis test to determine if there is strong evidence supporting the statement that less than 50% of adults who decide not to attend college are because they cannot afford it. State the hypotheses and validate the independence and success-failure condition. Compute test statistic, and p-value, interpret the data, and conclude if the null hypothesis needs to be rejected or not.

Let’s state the hypotheses:

  • H0: null hypothesis: The proportion of adults who did not attend college because they could not afford it is equal to 50%. (p = 0.50)
  • H1: alternative hypothesis: The proportion of adults who did not attend college because they could not afford it is less than 50%. (p < 0.50).

Let’s validate the independence and success-failure condition:

  • Independence: The sample is simple and random, thus, the independence condition is satisfied.
  • Success-failure condition:
    • \(n \cdot p_0 = 441 \cdot 0.50 = 220.5 \geq 10\)
    • \(n \cdot (1 - p_0) = 441 \cdot 0.50 = 220.5 \geq 10\)
    • Thus, the success-failure condition is satisfied.

Both conditions are satisfied, and we can proceed with the hypothesis test as the sampling distribution is approximately normal.

Let’s compute the standard error:

\[ SE = \sqrt{\frac{p_0 \cdot (1 - p_0)}{n}} = \sqrt{\frac{0.50 \cdot (1 - 0.50)}{441}} = 0.023 \]

Let’s find the z-value:

\[ z = \frac{\hat{p} - p_0}{SE} = \frac{0.38 - 0.50}{0.023} = -5.217 \]

Let’s find the p-value:

  • Since the alternative hypothesis is less than, we will find the p-value for the left tail.
  • Using the z-table, the p-value is less than 0.0001.

Let’s interpret the data:

  • The p-value is less than 0.0001, which is less than the significance level of 0.05.
  • When the p-value is less than the significance level, we reject the null hypothesis as there is little to no evidence to support it.

And the final conclusion: there is enough evidence to support the claim that less than 50% of adults who decide not to attend college did so because they cannot afford it.

Question (1.b)

b. Suppose we wanted the margin of error for the 90% confidence level to be about 1.5%. How large of a survey would you recommend?

We know that the margin of error is given by:

\[ ME = z^* \cdot SE = z^* \cdot \sqrt{\frac{p \cdot (1 - p)}{n}} \]

Then we can compute the sample size as:

\[ n = \frac{z^{*2} \times (p (1 - p))}{ME^2} \]

We have an estimate of the proportion of 0.38, and we want the margin of error to be 1.5% or 0.015. The z-value for a 90% confidence level is 1.645.

Let’s compute the sample size:

\[ n = \frac{1.645^2 \times (0.38 (1 - 0.38))}{0.015^2} ≈ 2833.508 \]

Thus, we would recommend a sample size of approximately 2834.


Part 2

A random sample study was conducted on 13,270 Texas and 4,681 Dallas residents. It was found that the proportion of residents who reported insufficient rest or sleep during each of the preceding 31 days is 7.0% in Texas and 6.8% in Dallas.

Question (2.a)

a. Calculate a 95% confidence interval for the difference between the proportions of sleep-deprived individuals among Texas residents and Dallas residents. Explain the validation of independence and success-failure condition. Construct the interval and interpret it in the context of this study.

Let’s prepare the data:

\[ \text{Texas: } n_{1} = 13270, \quad \hat{p}_1 = 0.07 \\ \text{Dallas: } n_{2} = 4681, \quad \hat{p}_2 = 0.068 \]

Let’s validate the independence and success-failure condition:

  • Independence: The sample is simple and random, thus, the independence condition is satisfied.
  • Success-failure condition:
    • For Texas:
      • \(n_{1} \cdot \hat{p}_1 = 13270 \cdot 0.07 = 928.9 \geq 10\)
      • \(n_{1} \cdot (1 - \hat{p}_1) = 13270 \cdot 0.93 = 12341.1 \geq 10\)
    • For Dallas:
      • \(n_{2} \cdot \hat{p}_2 = 4681 \cdot 0.068 = 318.308 \geq 10\)
      • \(n_{2} \cdot (1 - \hat{p}_2) = 4681 \cdot 0.932 = 4362.692 \geq 10\)
    • Thus, the success-failure condition is satisfied.

Both conditions are satisfied, and we can proceed with the hypothesis test as the sampling distribution is approximately normal.

Let’s compute the standard error:

\[ \begin{aligned} SE &= \sqrt{\frac{\hat{p_1}(1-\hat{p_1})}{n_1} +\frac{\hat{p_2}(1-\hat{p_2})}{n_2}} \\ &= \sqrt{\frac{0.07 \cdot (1 - 0.07)}{13270} + \frac{0.068 \cdot (1 - 0.068)}{4681}} \\ &= 0.00429 \end{aligned} \]

Let’s compute the confidence interval, using the z-value for a 95% confidence level, which is 1.96:

\[ \begin{aligned} ME &= z^* \cdot SE = 1.96 \cdot 0.00429 = 0.00842 \\ CI_{min} &= (\hat{p}_1 - \hat{p}_2) - ME = (0.07 - 0.068) - 0.00842 = -0.00642 \\ CI_{max} &= (\hat{p}_1 - \hat{p}_2) + ME = (0.07 - 0.068) + 0.00842 = 0.01042 \end{aligned} \]

Thus, the confidence interval is (-0.00642, 0.01042).

And we are 95% confident that the the difference between the proportions of sleep-deprived individuals among Texas residents and Dallas residents is between -0.642% and 1.042%.

Question (2.b)

b. Conduct a hypothesis test to determine if the provided data is strong evidence for the rate of sleep deprivation is different for the two states given α = 0.05. Calculate the test statistics, and p-value and provide a conclusion to support your observation.

Let’s state the hypotheses:

  • H0: null hypothesis: The proportion of sleep-deprived individuals is the same for Texas and Dallas residents. (p1 = p2).
  • H1: alternative hypothesis: The proportion of sleep-deprived individuals is different for Texas and Dallas residents. (p1 ≠ p2).

Let’s validate the independence and success-failure condition:

  • We already validated the independence and success-failure condition in the previous question.

Both conditions are satisfied, and we can proceed with the hypothesis test as the sampling distribution is approximately normal.

Let’s compute the standard error:

To compute the standard error, we need to find the pooled proportion, where X1 and X2 are the number of successes in each sample, computed by multiplying the sample size by the proportion:

\[ \begin{aligned} \hat{p} &= \frac{X_1 + X_2}{n_1 + n_2} \\ &= \frac{13270 \cdot 0.07 + 4681 \cdot 0.068}{13270 + 4681} \\ &= 0.0694 \end{aligned} \]

Then we can compute the standard error:

\[ \begin{aligned} SE &= \sqrt{\frac{\hat{p} \cdot (1 - \hat{p})}{n_1} + \frac{\hat{p} \cdot (1 - \hat{p})}{n_2} } \\ &= \sqrt{\frac{0.0694 \cdot (1 - 0.0694)}{13270} + \frac{0.0694 \cdot (1 - 0.0694)}{4681}} \\ &= 0.00432 \end{aligned} \]

Let’s find the z-value:

\[ \begin{aligned} z &= \frac{(\hat{p}_1 - \hat{p}_2)}{SE} \\ &= \frac{0.07 - 0.068}{0.00432} \\ &= 0.46296 \end{aligned} \]

Let’s find the tail area between z=0.46296 and end of the tail, and we find it as 0.322.

The p-value is 2 * 0.322 = 0.644 as the problem is a two-sided test.

Let’s interpret the data:

  • The p-value is greater than the significance level of 0.05.
  • When the p-value is greater than the significance level, we fail to reject the null hypothesis as there is little evidence against it.

And the final conclusion: there is not enough evidence at 0.05 significance level to support the claim that the rate of sleep deprivation is different for Texas and Dallas residents; hence, the sleep deprivation rate is the same for both states.

References

  • Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2019). Openintro statistics - Fourth edition. Open Textbook Library. <https://www.biostat.jhsph.edu/~iruczins/teaching/books/2019.openintro.statistics.pdf>