1. Foundations for Inference & Introduction to JASP¶

Statistical inference is primarily concerned with understanding and quantifying the uncertainty of parameter estimates.
Confidence Interval is a range of values where the true population value is likely to lie.

Foundations for Inference ¹ ³ ⁴¶

Point Estimate: A single value that best approximates the population parameter.

Sampling Error:
- The natural variability that we expect between different random samples and the total population.
- It results from the randomness of the sampling process.
- It is quantified by the Standard Error as it is the most cared about measure of uncertainty.
Bias:
- The point estimate is systematically higher or lower than the population parameter.
- It is a systemic tendency to under or over estimate the population parameter.
- It is especially important during the data collection phase.

If we take many samples of the same size from the same population, the point estimates will vary from sample to sample.
If we plot the distribution of point estimates, we get the sampling distribution.
The distribution of point estimates based on samples of a fixed size from a certain population.
It resembles a normal distribution (bell-shaped, symmetric) centered at the true population parameter.
The mean of the sampling distribution is the population parameter.
The standard deviation of the sampling distribution is the standard error.

When observations are independent and the sample size is sufficiently large, the sampling distribution of the parameter estimate will follow a normal distribution with a mean equal to the population parameter \({\mu}_{\hat{p}}=p\) and a stand error computed as \(SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\).
For the CLT to hold, two conditions must be met
- The independence of observations.
- The success-failure condition:
  - \(np > 10\).
  - \(n(1-p) > 10\).
The problem then becomes normal distribution problem:
- The mean of the sampling distribution is the point estimate.
- The standard error is the standard deviation of the sampling distribution.
- We plot the distribution by finding the z-score and using the normal distribution table.
- \(z_{1} = \frac{\hat{p_{min}} - p}{SE_{\hat{p}}}\), \(z_{2} = \frac{\hat{p_{max}} - p}{SE_{\hat{p}}}\).
- It is hard to find z1 nad z2 as it requires us to make more samplings.
- We use the confidence interval to estimate the range of values where the true population parameter is likely to lie.
- With a 95% confidence level, we can say that \(z_{1} = -1.96\) and \(z_{2} = 1.96\).
- With a 99% confidence level, we can say that \(z_{1} = -2.58\) and \(z_{2} = 2.58\).

If we have a point estimate for a sample, and we confirmed that the central limit theorem holds and the sampling distribution is approximately normal, we can use the plug-in principle to plugin the sample point estimation (sample parameter) in place of the population parameter.

The problem is the point estimate (of a sample) may not truly represent the population parameter.
Instead of providing a single point estimate, we provide a range of values where the true population parameter is likely to be.
Constructing 95% confidence interval:
- In a normal distribution, 95% of the observations fall within 1.96 standard deviations of the mean (distribution center).
- Thus, if a point estimate can be modeled using a normal distribution, we can construct a plausible range with 95% confidence as \([\hat{p} - 1.96\times{SE}, \hat{p} + 1.96\times{SE}]\)
Interpreting confidence level:
- We are 95% confident that the true population parameter lies within the interval.
- For example, [0.45, 0.55] is the 95% confidence level for people supporting solar panels, and we can say:
  - We are 95% confident that the actual percentage of public supporting solar panels is between 45% and 55%.
Common confidence levels:
- 90% confidence level: \(z = 1.645\) and the interval is \(\hat{p} \pm 1.645\times{SE}\).
- 95% confidence level: \(z = 1.96\) and the interval is \(\hat{p} \pm 1.96\times{SE}\).
- 99% confidence level: \(z = 2.58\) and the interval is \(\hat{p} \pm 2.58\times{SE}\).
Confidence intervals says nothing about individual observations.
Confidence intervals says nothing about future samples.
It is NOT the probability that the true parameter lies within the interval.

Diez, D. M., Barr, C. D., & Çetinkaya-Rundel, M. (2019). Openintro statistics - Fourth edition. Open Textbook Library. https://www.biostat.jhsph.edu/~iruczins/teaching/books/2019.openintro.statistics.pdf Read Chapter 5 - Foundations for Inference from page 168-205. Section 5.1 - Point estimates and sampling variability. Section 5.2 - Confidence intervals for a proportion. Solve the following practice exercises as homework from the attached: Practice Exercises – Unit 1 https://my.uopeople.edu/pluginfile.php/1897551/mod_book/chapter/531355/Practice%20Excercises%20-%20%20Unit%201_Final.pdf ↩
Goss-Sampson, M. A. (2022). Statistical analysis in JASP: A guide for students (5^th ed., JASP v0.16.1 2022). https://jasp-stats.org/wp-content/uploads/2022/04/Statistical-Analysis-in-JASP-A-Students-Guide-v16.pdf Read Page 2-31 ↩
OpenIntroOrg. (2019a, September 02). Foundations for inference: Point estimates [Video]. YouTube. https://youtu.be/oLW_uzkPZGA ↩
OpenIntroOrg. (2019b, September 6). Intro to confidence intervals via proportions [Video]. YouTube. https://youtu.be/A6_W8qY8zJo ↩
JASP Statistics. (2022, October 05). Introduction to JASP [Video]. YouTube. https://youtu.be/APRaBFC2lEQ ↩