Download a PDF of this page here.
Download the Spanish version here
Essential Concepts
- One of the goals of statistical inference is to draw a conclusion about a population on the basis of a random sample from the population. Random samples vary, so we need to understand how much they vary and how they relate to the population. Our ultimate goal is to create a probability model that describes the long-run behavior of sample measurements. We use this model to make inferences about the population.
- When we want to describe the characteristics of a sample, we call the values statistics. However, when we want to describe the characteristics of a population, we call those values parameters.
- We can use mathematical theory to derive expressions for the mean and standard deviation of the sampling distribution of the sample proportion. When taking many random samples of size [latex]n[/latex] from a population distribution with population proportion [latex]p[/latex]:
-
- The mean of the distribution of sample proportions is [latex]p[/latex].
- The standard deviation of the distribution of sample proportions is [latex]\sqrt{\frac{p(1-p)}{n}}[/latex].
- In order to get a sense of the pattern of variation in sample proportions, we need to generate more than five samples. The distribution showing how sample proportions vary from sample to sample is called a sampling distribution of the sample proportion.
- The Central Limit Theorem states that, as the sample size gets larger, the distribution of the sample proportion will become closer to a normal distribution.
- When taking many random samples of size [latex]n[/latex] from a population distribution with population proportion [latex]p[/latex]:
-
- The mean of the distribution of sample proportions is [latex]p[/latex].
- The standard deviation of the distribution of sample proportions is [latex]\sqrt{\frac{p(1-p)}{n}}[/latex].
- If [latex]np\geq 10[/latex] and [latex]n(1-p) \geq 10[/latex], then the Central Limit Theorem states that the distribution of the sample proportions follows an approximate normal distribution with mean [latex]p[/latex] and standard deviation [latex]\sqrt{\frac{p(1-p)}{n}}[/latex]
- In practice, we do not know the population proportion, nor do we have the luxury of taking thousands of random samples. Instead, we observe a single random sample. In this case, we need to estimate the mean and standard deviation of the sample proportion:
-
- The estimated mean of the distribution of sample proportions is [latex]\hat{p}[/latex].
- To distinguish it from the true standard deviation of sample proportions, we call the estimated standard deviation of sample proportions the standard error of [latex]\hat{p}[/latex]:
- If the desired standard deviation is known, then we can calculate the sample size needed by working backwards from the standard deviation formula.
Key Equations
sample size
[latex]n = \frac{p(1-p)}{(\sigma_{\hat{p}})^2}[/latex]
standard error:
[latex]SE = \sqrt{\frac{\hat{p}(1-\hat{p})}{n}}[/latex]
standard deviation
[latex]\sigma_\hat{p}=\sqrt{\frac{p(1-p)}{n}}[/latex]
Glossary
Central Limit Theorem
If [latex]np\geq 10[/latex] and [latex]n(1-p) \geq 10[/latex], then the Central Limit Theorem states that the distribution of the sample proportions follows an approximate normal distribution with mean p and standard deviation [latex]\sqrt{\frac{p(1-p)}{n}}[/latex]
parameters
numbers that describe a population
population
the population is the entire collection of individuals or objects that you want to learn about
sample
a sample is a part of the population that is selected for study
sampling distribution of a statistic
a probability distribution that describes the long-term behavior of the statistic.
sampling distribution of a sample proportion
a probability distribution that describes how sample proportions vary from sample to sample
standard deviation
a measure that describes the variability of a population
standard error
an estimate of the variability across the samples of a population
statistics
numbers that are calculated from a sample