JA1. Exercises¶

Problem 1¶

A sample was taken randomly of 675 families in the Dominican Republic, 232 responded they could not afford $300 unexpected expenses without tapping into loans.

1. Define the population in this survey¶

The population in this survey is all families in the Dominican Republic. This is the finite population that the sample was drawn from.

2. What is the population parameter estimated in this survey?¶

The population parameter estimated in this survey is the proportion of families in the Dominican Republic who cannot afford $300 unexpected expenses without tapping into loans.

As it is hard to evaluate the entire population, we use the sample to estimate the parameter, and then expand the estimate to the population.

3. What is the point estimate for the parameter?¶

The point estimate is the number of families in the sample who cannot afford $300 unexpected expenses without tapping into loans divided by the total number of families in the sample.

\[ \hat{p} = \frac{232}{675} \approx 0.343 \]

So, 34.3% of families in the sample cannot afford $300 unexpected expenses without tapping into loans.

4. What is the statistic used to measure the uncertainty of the point estimate? Compute the statistics¶

The statistic used to measure the uncertainty of the point estimate is the standard error (SE) of the proportion.

To compute standard error, we must prove that the Central Limit Theorem (CLT) applies to the sample and then we can use the formula:

\[ SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \]

Where p^ is the point estimate, n is the sample size, and SE is the standard error.

To prove that the CLT holds, we must check the following conditions:

Independence: The sample is random, so the observations are independent.
Success-failure condition:

\[ n\hat{p} = 675 \times 0.343 = 232.725 \geq 10 \\ n(1-\hat{p}) = 675 \times (1-0.343) = 442.275 \geq 10 \]

Thus, the theory holds, and we can use the formula to compute the standard error.

\[ \begin{aligned} SE &= \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\ &= \sqrt{\frac{0.343(1-0.343)}{675}} \\ &\approx 0.017 \end{aligned} \]

5.Consider the true population value is found to be 40%. Would the resulting value change much if we were to use this proportion to proportion to recompute the value of the statistic in (d) using p=0.4?¶

Let’s compute the standard error using the new population proportion (p = 0.4).

\[ \begin{aligned} SE &= \sqrt{\frac{p(1-p)}{n}} \\ &= \sqrt{\frac{0.4(1-0.4)}{675}} \\ &\approx 0.018 \end{aligned} \]

The standard error is slightly higher when using the new population proportion (p = 0.4) compared to the original population proportion (p = 0.343). This reflects a small difference about what a sample can tell us about the population versus the true population value.

The difference between the population SE (0.018) and the sample SE (0.017) is small, so the resulting value would not change much if we were to use this proportion to recompute the value of the statistic.

Problem 2¶

A cinema theatre conducted a random sample on 504 viewers over a period of a year and found that 124 of them made their visit because of a coupon they had received in their mail.

Construct a 95% confidence interval for the fraction of all those viewers who made a visit because of a coupon they’d received in the mail¶

First, let’s compute the point estimate for the parameter:

\[ \hat{p} = \frac{124}{504} \approx 0.246 \]

Now, let’s prove that the CLT holds for the sample:

Independence: The sample is random, so the observations are independent.
Success-failure condition:

\[ n\hat{p} = 504 \times 0.246 = 124.184 \geq 10 \\ n(1-\hat{p}) = 504 \times (1-0.246) = 379.816 \geq 10 \]

Thus, the theory holds, and we can use the formula to compute the standard error:

\[ \begin{aligned} SE &= \sqrt{\frac{\hat{p}(1-\hat{p})}{n}} \\ &= \sqrt{\frac{0.246(1-0.246)}{504}} \\ &\approx 0.019 \end{aligned} \]

Now, let’s construct the 95% confidence interval for the fraction of all those viewers who made a visit because of a coupon they’d received in the mail:

\[ \begin{aligned} cl_{lower} &= \hat{p} - 1.96 \times SE \\ &= 0.246 - 1.96 \times 0.019 \\ &\approx 0.208 \\ \end{aligned} \]

\[ \begin{aligned} cl_{upper} &= \hat{p} + 1.96 \times SE \\ &= 0.246 + 1.96 \times 0.019 \\ &\approx 0.283 \\ \end{aligned} \]

Thus the confidence interval is (0.208, 0.283).

And we are 95% confident that the true proportion of all those viewers who made a visit because of a coupon they’d received in the mail is between 20.8% and 28.3%.