Skip to content

WA3. Testing for Independence Exercises

Problem 1

Microhabitat factors associated with forage and bed sites of barking deer in Hainan Island, China were examined. In this region woods make up 4.8% of the land, cultivated grass plot makes up 14.7%, and deciduous forests make up 39.6%. Of the 530 sites where the deer forage, 6 were categorized as woods, 18 as cultivated grassplot, and 71 as deciduous forests. The table below summarizes these data.

Table 1 Woods Cultivated Grassplot Deciduous Forests Other Total
6 18 71 435 530

a. Write the hypotheses for testing if barking deer prefer to forage in certain habitats over others.

  • H0: Null hypothesis: The barking deer do not prefer to forage in certain habitats over others, that is, the deer forage in each habitat in proportion to the habitat’s availability.
  • H1: Alternative hypothesis: The barking deer prefer to forage in certain habitats over others, that is, the deer forage in some habitats more or less than its availability in the region.

b. What type of test can we use to answer this research question?.

The chi-squared goodness of fit test can be used to answer this research question; if there is no significant deviation between the observed and expected values for each category (habitat), we can conclude that the deer forage in each habitat in proportion to the habitat’s availability.

c. Check if the assumptions and conditions required for this test are satisfied.

According to (Çetinkaya-Rundel, 2019), the conditions for the chi-square test for homogeneity are:

  • (1) Independence: Sampled observations must be independent.
    • (1.A) Random sample/assignment: The data should be collected using a random method.
    • (1.B) If sampling without replacement, the sample size should be less than 10% of the population size.
    • (1.C) Each case only contributes to one cell in the table: The data should be mutually exclusive, and each observation should belong to only one category.
  • (2) Sample size: Each cell should have at least 5 expected counts.

(1.A) is met because the researchers collected randomly. (1.B) is met because the sample size (530) is less than 10% of the population of deers. (1.C) is met because each observation belongs to only one category. (2) is met because each cell has at least 5 expected counts.

All conditions are met, so the sample distribution follows the chi-square distribution, and we can use the chi-square test for testing the goddess of fit.

d. Do these data provide convincing evidence that barking deer prefer to forage in certain habitats over others? Conduct an appropriate hypothesis test to answer this research question.

d-1. Calculate the chi-squared statistic.

Let’s calculate the expected counts for each cell, which is calculated using the percentage of each habitat in the region and the total number of sites where the deer forage.

Table 2 Woods Cultivated Grassplot Deciduous Forests Other Total
Sample (O) 6 18 71 435 530
Expected Percentage (EP) 4.8% 14.7% 39.6% 40.9% 100%
Expected (E) = EP * 530 25.44 77.91 209.88 216.77 530
Chi factor (O-E)^2/E 14.86 46.07 91.9 219.7

No let’s calculate the chi-squared statistic using the formula:

\[\chi^2 = \sum_{1}^{k}{\frac{(O_k-E_k)^2}{E_k}}\]

where \(k\) is the number of categories (cells).

\[\chi^2 = 14.86 + 46.07 + 91.9 + 219.7 = 372.53\]

So, the chi-squared statistic is 372.53.

d-2. Calculate the degree of freedom.

The degrees of freedom is the number of categories minus one, which is:

\[df = k-1 = 4-1 = 3\]

d-3. Given that the p-value < 0.001, give a conclusion.

Let’s use the chi-square distribution table to find the p-value. The p-value is 0.00000.

The p-value < alpha (0.00000 < 0.05), so we reject the null hypothesis. We have enough evidence to conclude that the barking deer prefer to forage in certain habitats over others, that is, the deer forage in some habitats more or less than its availability in the region.

Looking at the Table 2, We see increasing in the observed values for woods, cultivated grassplot, and deciduous forests, and a decrease in the observed value for other habitats. This indicates that the deer prefer to forage in woods, grassplot, and forests more than its availability in the region.


Problem 2

The OpenIntro website occasionally experiments with design and link placement. We conducted one experiment testing three different placements of a download link for this textbook on the book’s main page to see which location, if any, led to the most downloads. The number of site visitors included in the experiment was 501 and is captured in one of the response combinations in the following table:

Table 3 Download No Download
Position 1 16.0% 20.9%
Position 2 14.8% 21.2%
Position 3 11.9% 15.2%

a. Calculate the actual number of site visitors in each of the six response categories.

The actual number of site visitors in each of the six response categories can be calculated by multiplying the percentage of each response category by the total number of site visitors (501). The totals of each row and column are also calculated and added to Table 4.

Table 4 Position 1 Position 2 Position 3 Total
Download 80 74 60 214
No Download 105 106 76 287
Total 185 180 136 501

b. Each individual in the experiment had an equal chance of being in any of the three experiment groups. However, we see that there are slightly different totals for the groups. Do you think that there is any evidence that the groups were actually imbalanced? Make sure to clearly state hypotheses, check conditions, calculate the appropriate test statistic. Given that the p-value is 0.01215, make your conclusion in context of the data.

The expected visitors for each position assuming that the visitors are equally distributed among the three positions are: 501/3 = 167. The actual values comes from the total for each column in Table 4.

Table 5 Position 1 Position 2 Position 3 Total
Actual 185 180 136 501
Expected 167 167 167 501

The problem is now a chi-squared goodness of fit test of one way table. The hypotheses are:

  • H0: Null hypothesis: The groups were balanced, that is, the number of site visitors in each group is equal.
  • H1: Alternative hypothesis: The groups were imbalanced, that is, the number of site visitors in some groups differ from the expected value.

The chi-squared statistic is calculated using the formula:

\[\chi^2 = \sum_{1}^{k}{\frac{(O_k-E_k)^2}{E_k}}\]

where \(k\) is the number of categories (cells).

\[ \begin{align*} \chi^2 &= \frac{(185-167)^2}{167} + \frac{(180-167)^2}{167} + \frac{(136-167)^2}{167} \\ &= 8.706 \end{align*} \]

The degrees of freedom is the number of categories minus one, which is:

\[df = k-1 = 3-1 = 2\]

The p-value is the tail are above or greater the observed test statistic. We will use the chi-square distribution table to find the p-value. The p-value is 0.01291.

The p-value < alpha (0.01291 < 0.05), so we reject the null hypothesis. We have enough evidence to conclude that the groups were imbalanced, that is, the number of site visitors in some groups differ from the expected value.

References