1. Foundations for Inference & Introduction to JASP

  • Statistical inference is primarily concerned with understanding and quantifying the uncertainty of parameter estimates.
  • Confidence Interval is a range of values where the true population value is likely to lie.

Foundations for Inference 1 3 4

  • Point Estimate: A single value that best approximates the population parameter.

Uncertainty in point estimates

  • Sampling Error:
    • The natural variability that we expect between different random samples and the total population.
    • It results from the randomness of the sampling process.
    • It is quantified by the Standard Error as it is the most cared about measure of uncertainty.
  • Bias:
    • The point estimate is systematically higher or lower than the population parameter.
    • It is a systemic tendency to under or over estimate the population parameter.
    • It is especially important during the data collection phase.

Sampling Distribution

  • If we take many samples of the same size from the same population, the point estimates will vary from sample to sample.
  • If we plot the distribution of point estimates, we get the sampling distribution.
  • The distribution of point estimates based on samples of a fixed size from a certain population.
  • It resembles a normal distribution (bell-shaped, symmetric) centered at the true population parameter.
  • The mean of the sampling distribution is the population parameter.
  • The standard deviation of the sampling distribution is the standard error.
  • sampling distribution

Central Limit Theorem

  • When observations are independent and the sample size is sufficiently large, the sampling distribution of the parameter estimate will follow a normal distribution with a mean equal to the population parameter \({\mu}_{\hat{p}}=p\) and a stand error computed as \(SE_{\hat{p}} = \sqrt{\frac{p(1-p)}{n}}\).
  • For the CLT to hold, two conditions must be met
    • The independence of observations.
    • The success-failure condition:
      • \(np > 10\).
      • \(n(1-p) > 10\).
  • The problem then becomes normal distribution problem:
    • The mean of the sampling distribution is the point estimate.
    • The standard error is the standard deviation of the sampling distribution.
    • We plot the distribution by finding the z-score and using the normal distribution table.
    • \(z_{1} = \frac{\hat{p_{min}} - p}{SE_{\hat{p}}}\), \(z_{2} = \frac{\hat{p_{max}} - p}{SE_{\hat{p}}}\).
    • normal distribution graph
    • It is hard to find z1 nad z2 as it requires us to make more samplings.
    • We use the confidence interval to estimate the range of values where the true population parameter is likely to lie.
    • With a 95% confidence level, we can say that \(z_{1} = -1.96\) and \(z_{2} = 1.96\).
    • With a 99% confidence level, we can say that \(z_{1} = -2.58\) and \(z_{2} = 2.58\).

The Plug-in Principle

  • If we have a point estimate for a sample, and we confirmed that the central limit theorem holds and the sampling distribution is approximately normal, we can use the plug-in principle to plugin the sample point estimation (sample parameter) in place of the population parameter.

Confidence Intervals

  • The problem is the point estimate (of a sample) may not truly represent the population parameter.
  • Instead of providing a single point estimate, we provide a range of values where the true population parameter is likely to be.
  • Constructing 95% confidence interval:
    • In a normal distribution, 95% of the observations fall within 1.96 standard deviations of the mean (distribution center).
    • Thus, if a point estimate can be modeled using a normal distribution, we can construct a plausible range with 95% confidence as \([\hat{p} - 1.96\times{SE}, \hat{p} + 1.96\times{SE}]\)
  • Interpreting confidence level:
    • We are 95% confident that the true population parameter lies within the interval.
    • For example, [0.45, 0.55] is the 95% confidence level for people supporting solar panels, and we can say:
      • We are 95% confident that the actual percentage of public supporting solar panels is between 45% and 55%.
  • Common confidence levels:
    • 90% confidence level: \(z = 1.645\) and the interval is \(\hat{p} \pm 1.645\times{SE}\).
    • 95% confidence level: \(z = 1.96\) and the interval is \(\hat{p} \pm 1.96\times{SE}\).
    • 99% confidence level: \(z = 2.58\) and the interval is \(\hat{p} \pm 2.58\times{SE}\).
  • Confidence intervals says nothing about individual observations.
  • Confidence intervals says nothing about future samples.
  • It is NOT the probability that the true parameter lies within the interval.

Introduction to JASP 2 5

Descriptive Statistics

  • Descriptive statistics and related plots are a succinct way of describing and summarising data but do not test any hypotheses. it includes:
    • Measures of central tendency.
    • Measures of dispersion.
    • Percentile values.
    • Measures of distribution.
    • Descriptive plots.
  • Central Tendency:
    • The tendency for variable values to cluster around a central value.
    • Mean, Median, Mode.
    • Mean: M or \(\bar{x}\). It equals the sum of all values divided by the number of values. It equals the average. It is sensitive to outliers.
    • Median: Mdn. It is the middle value when all values are ordered. It is less sensitive to outliers.
    • Mode: The most frequent value.


