Module 11: Cheat Sheet

Download a PDF of this page here.

Essential Concepts

  • The mathematical formulas to find the mean and standard deviation of the sampling distribution of the sample mean for samples of size [latex]n[/latex] are
    • Mean of the sampling distribution of the sample mean [latex]\mu_\bar{x}=\mu[/latex]
    • Standard deviation of the sampling distribution of the sample mean [latex]\sigma_\bar{x}=\frac{\sigma}{\sqrt{n}}[/latex]

where [latex]\mu[/latex] and [latex]\sigma[/latex] represent the mean and standard deviation of the original population, respectively.

  • If the population distribution is normal, the distribution of the sample means will also follow a normal distribution for any sample size.
  • If the population distribution is not normal, the Central Limit Theorem states that the distribution of the sample means still follows an approximate normal distribution as long as the sample size is large (e.g., [latex]n \ge 30[/latex]) and the population distribution is not strongly skewed.
  • When we estimate [latex]\sigma[/latex] using the sample standard deviation, [latex]s[/latex], we use the standardized [latex]t[/latex]-statistic:

[latex]t=\dfrac{\bar{x}-[\text{mean of } \bar{x}'s]}{\text{std. error of } \bar{x}'s}[/latex] [latex]= \dfrac{\bar{x}-\mu}{\frac{s}{\sqrt{n}}}[/latex] with a degree of freedom of ​​[latex]df = n – 1[/latex].

  • The formula for a confidence interval for a population mean is:

[latex]\text{estimate }\pm \text{ margin of error}[/latex] or [latex]\bar{x} \pm (t\text{-critical value})\frac{s}{\sqrt{n}}[/latex]

where [latex]\bar{x}[/latex] is the sample mean and the standard error used is the standard error of the sample mean, [latex]\frac{s}{\sqrt{n}}[/latex]. The [latex]t[/latex]-critical value ([latex]t[/latex]-score in the data analysis tool) in the confidence interval will depend on the sample size (degrees of freedom for the [latex]t[/latex]-distribution: [latex]df=n-1[/latex]) and the confidence level.

  • When you are interested in estimating a difference in population means, you usually start with data from samples from each of the populations of interest. There are two different strategies for selecting the two samples.
    • Independent samples: to select a sample from one population and then independently select a sample from the second population.
    • Paired samples or Dependent samples: if samples are chosen in a way that results in the observations in one sample being paired with the observations in the other sample.
  • The conditions that you need to check for the two-sample [latex]t[/latex] confidence interval are:
    • The samples are independent.
    • Each sample is a random sample from the corresponding population of interest or it is reasonable to regard the sample as if it were a random sample. It is reasonable to regard the sample as a random sample if it was selected in a way that should result in the sample being representative of the population. If the data are from an experiment, you just need to check that there was random assignment to experimental groups—this substitutes for the random sample condition and also results in independent samples.
    • For each population, the distribution of the variable that was measured is approximately normal, or the sample size for the sample from that population is large. Usually, a sample of size [latex]30[/latex] or more is considered to be “large.” If a sample size is less than [latex]30[/latex], you should look at a plot of the data from that sample (a dotplot, a boxplot, or, if the sample size isn’t really small, a histogram) to make sure that the distribution looks approximately symmetric and that there are no outliers.
  • The form of the confidence intervals for a difference in sample means is:

estimate [latex]\pm[/latex] margin of error

where margin of error = ([latex]t[/latex]-statistic)(standard error) = ([latex]t[/latex]-statistic)[latex](\sqrt{\frac{{{s}_{1}}^{2}}{{n}_{1}}+\frac{{{s}_{2}}^{2}}{{n}_{2}}})[/latex]

Key Equations

confidence interval for one sample

[latex]\bar{x} \pm (t\text{-critical value})\frac{s}{\sqrt{n}}[/latex]

confidence interval for two samples

([latex]\bar{x_{1}}-\bar{x_{2}})\pm (t[/latex]-statistic)[latex](\sqrt{\frac{{{s}_{1}}^{2}}{{n}_{1}}+\frac{{{s}_{2}}^{2}}{{n}_{2}}})[/latex]

standardized statistic

[latex]z=\dfrac{\bar{x}-[\text{mean of } \bar{x}'s]}{\text{std. deviation of } \bar{x}'s}[/latex] [latex]= \dfrac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}}[/latex]

degrees of freedom

[latex]df = n – 1[/latex]

[latex]t[/latex]-score

[latex]t=\dfrac{\stackrel{¯}{x}-μ}{\frac{s}{\sqrt{n}}}[/latex]

standard error of a sample mean 

[latex]SE(\bar{x})=\dfrac{s}{\sqrt{n}}[/latex]

margin of error

([latex]t[/latex]-statistic)(standard error) = ([latex]t[/latex]-statistic)[latex](\sqrt{\frac{{{s}_{1}}^{2}}{{n}_{1}}+\frac{{{s}_{2}}^{2}}{{n}_{2}}})[/latex]

Glossary

independent sample

random samples from each population

paired samples, dependent samples

samples that are chosen in a way that results in the observations in one sample being paired with the observations in the other sample

sampling distribution

the probability distribution of a sample statistic

standardized statistic

a [latex]t[/latex]-score for a statistic using simulation to estimate the mean and standard deviation of the sample mean