DA4. Paired and Unpaired Data¶

Statement¶

You have learned about t-distribution, paired and unpaired data, and how it is appropriate to use them. Answer the following questions based on your learning.

1. Identify an example of paired data and unpaired data. The data identified can be in your field of interest.

As an example of paired data, I will take the example of high school and beyond. A sample of 200 students was taken, and their reading and writing scores were recorded in order to compare how students did in both subjects (Cetinkaya-Rundel, 2018, 5 2 Inference for paired data).

ID	Read Score	Write Score	Diff
1	57	52	5
2	44	33	11
3	63	44	19
4	47	52	-5
…	…	…	…
200	63	65	-2

As an example of unpaired data, I will take the example of distraction and recall of food consumed and snacking. A sample of 44 patients were randomized into two groups; one was asked to play video games while eating lunch, and the other was asked to eat lunch without any distractions. Both groups were offered snacks (biscuits) after lunch, and the amount of snacks consumed was recorded (Cetinkaya-Rundel, 2018, 5 1B Inference for a mean).

Biscuits Consumed	\(\bar{x}\)	\(s\)	\(n\)
Distraction Group	52.1 g	45.1 g	22
No Distraction Group	27.1 g	26.4 g	22

2. Describe the data identified above, and clearly highlight what distinguishes them.

The paired data is data where two observations are linked in some way. The two observations set in the high school and beyond example are related to each other, as high-performing students usually do better in both, and low-performing students do worse in both. The difference between the two observations is calculated, and it is used to perform inference. The difference is the new data set that is used to calculate the mean and standard deviation.

The unpaired data is data where two observations are totally independent. The two observations set in the distraction and recall of food consumed and snacking example are independent of each other, as the two groups were randomized and the two activities in the experiment (eating and playing games) are not related. The mean and standard deviation of each group are calculated separately, and the difference between the two means is used to perform inference.

3. For each of the datasets identified above, state if it is appropriate to use the one-sample t-procedure or the two-sample t-procedure and explain why.

In the paired data example of high school and beyond, it is appropriate to use the one-sample t-procedure on the difference observations. The null hypothesis is that the mean difference is 0, and the alternative hypothesis is that the mean difference is not 0. Computations can be done according to the equations below:

\[ \begin{aligned} point\ estimate &± margin\ of\ error\\ \bar{x} &± t^*_{df}SE_{\bar{x}} \\ \bar{x} &± t^*_{df}\frac{s}{\sqrt{n}} \\ \bar{x} &± t^*_{n-1}\frac{s}{\sqrt{n}} \end{aligned} \]

Where:
- s is the sample standard deviation.
- n is the sample size.
- t* is the t-score.
- df is the degrees of freedom.
- SE is the standard error.
- x is the sample mean.

In the unpaired data example of distraction and recall of food consumed and snacking, it is appropriate to use the two-sample t-procedure on the two independent groups. Computations can be done according to the equations below:

\[ \begin{aligned} point\ estimate &± margin\ of\ error\\ (\bar{x}_1 - \bar{x}_2) &± t^*_{df}SE_{\bar{x}_1 - \bar{x}_2} \\ (\bar{x}_1 - \bar{x}_2) &± t^*_{df}\sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \\ (\bar{x}_1 - \bar{x}_2) &± t^*_{(min(n_1 -1, n_2-2))} \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \end{aligned} \]

Where:

\[ SE_{\bar{x}_1 - \bar{x}_2} = \sqrt{\frac{s_1^2}{n_1} + \frac{s_2^2}{n_2}} \\ df = min(n_1 -1, n_2-2) \]

References¶

Çetinkaya-Rundel, M. (2018b, February 20). 5 1B Inference for a mean [Video]. YouTube. https://youtu .be/RYVIGj1l4xs
Çetinkaya-Rundel, M. (2018c, February 20). 5 2 Inference for paired data [Video]. YouTube. https://youtu .be/K0QZ9_4w0HU