Module 8: Cheat Sheet

Download a PDF of this page here.

Download the Spanish version here

Essential Concepts

  • One of the goals of statistical inference is to draw a conclusion about a population on the basis of a random sample from the population. Random samples vary, so we need to understand how much they vary and how they relate to the population. Our ultimate goal is to create a probability model that describes the long-run behavior of sample measurements. We use this model to make inferences about the population.
  • When we want to describe the characteristics of a sample, we call the values statistics. However, when we want to describe the characteristics of a population, we call those values parameters.
  • We can use mathematical theory to derive expressions for the mean and standard deviation of the sampling distribution of the sample proportion. When taking many random samples of size [latex]n[/latex] from a population distribution with population proportion [latex]p[/latex]:
    • The mean of the distribution of sample proportions is [latex]p[/latex].
    • The standard deviation of the distribution of sample proportions is [latex]\sqrt{\frac{p(1-p)}{n}}[/latex].
  • In order to get a sense of the pattern of variation in sample proportions, we need to generate more than five samples. The distribution showing how sample proportions vary from sample to sample is called a sampling distribution of the sample proportion.
  • The Central Limit Theorem states that, as the sample size gets larger, the distribution of the sample proportion will become closer to a normal distribution.
  • When taking many random samples of size [latex]n[/latex] from a population distribution with population proportion [latex]p[/latex]:
    • The mean of the distribution of sample proportions is [latex]p[/latex].
    • The standard deviation of the distribution of sample proportions is [latex]\sqrt{\frac{p(1-p)}{n}}[/latex].
    • If [latex]np\geq 10[/latex] and [latex]n(1-p) \geq 10[/latex], then the Central Limit Theorem states that the distribution of the sample proportions follows an approximate normal distribution with mean [latex]p[/latex] and standard deviation [latex]\sqrt{\frac{p(1-p)}{n}}[/latex]
  • In practice, we do not know the population proportion, nor do we have the luxury of taking thousands of random samples. Instead, we observe a single random sample. In this case, we need to estimate the mean and standard deviation of the sample proportion:
    • The estimated mean of the distribution of sample proportions is [latex]\hat{p}[/latex].
    • To distinguish it from the true standard deviation of sample proportions, we call the estimated standard deviation of sample proportions the standard error of [latex]\hat{p}[/latex]:
  • If the desired standard deviation is known, then we can calculate the sample size needed by working backwards from the standard deviation formula.

Key Equations

sample size

[latex]n = \frac{p(1-p)}{(\sigma_{\hat{p}})^2}[/latex]

standard error:

[latex]SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}[/latex]

standard deviation

[latex]\sigma_\hat{p}=\sqrt{\frac{p(1-p)}{n}}[/latex]

Glossary

Central Limit Theorem

If [latex]np\geq 10[/latex] and [latex]n(1-p) \geq 10[/latex], then the Central Limit Theorem states that the distribution of the sample proportions follows an approximate normal distribution with mean p and standard deviation [latex]\sqrt{\frac{p(1-p)}{n}}[/latex]

parameters

numbers that describe a population

population

the population is the entire collection of individuals or objects that you want to learn about

sample

a sample is a part of the population that is selected for study

sampling distribution of a statistic 

a probability distribution that describes the long-term behavior of the statistic.

sampling distribution of a sample proportion 

a probability distribution that describes how sample proportions vary from sample to sample

standard deviation

a measure that describes the variability of a population

standard error

an estimate of the variability across the samples of a population

statistics

numbers that are calculated from a sample