Download a PDF of this page here.
Essential Concepts
- A point estimate is a single value and plausible estimate of a population parameter based on representative sample data. The sample proportion is used as a point estimate of the population proportion.
- When taking many random samples of size n from a population distribution with proportion [latex]p[/latex]:
-
- The mean of the distribution of sample proportions is [latex]p[/latex].
- The standard deviation of sample proportions’ distribution is[latex]\sqrt{\frac{p(1-p)}{n}}[/latex].
- If [latex]np \geq 10[/latex] and [latex]n(1-p) \geq 10[/latex], then the Central Limit Theorem (CLT) states that the distribution of the sample proportions follows an approximate normal distribution with mean [latex]p[/latex] and standard deviation [latex]\sqrt{\frac{p(1-p)}{n}}[/latex]
- When the sample size is large enough, we can use [latex]\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}[/latex] in place of[latex]\sqrt{\frac{p(1-p)}{n}}[/latex]. This is called the standard error, which is the estimated standard deviation of sample proportions.
- A confidence interval for a population proportion is a reasonable range of values in which we expect the population proportion to fall. This requires a chosen degree of confidence.
- A confidence interval is calculated using the point estimate and the margin of error: point estimate ± margin of error
- The sample proportion is used as the point estimate when estimating a population proportion.
- The margin of error (ME) is what determines the width of the interval. A confidence interval will have a width of twice the margin of error.
- [latex]ME = z^{*} \cdot (\text{standard error})[/latex] where:
- standard error = [latex]\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}[/latex]
- z critical value ([latex]z^{*}[/latex]). This is the point on the standard normal distribution such that the proportion of area under the curve between [latex]-z^{*}[/latex] and [latex]+z^{*}[/latex] is [latex]C[/latex], the confidence level.
Confidence Level [latex]z*[/latex] [latex]90%[/latex] [latex]1.645[/latex] [latex]95%[/latex] [latex]1.96[/latex] [latex]99.7%[/latex] [latex]2.576[/latex]
- The confidence level, [latex]C[/latex], tells us how much confidence we have in the method used to construct the interval. It corresponds to the percentage of all intervals we would expect to contain the true population parameter.
- Confidence level is a measure of our confidence in the method. The interpretation of a confidence interval depends on the confidence level. For example, the interpretation of a 95% confidence level would be: We can be 95% confident that the interval ___ to ___ captures the true proportion of _______________.
- The three conditions when calculating confidence intervals for a population proportion are:
- Random samples: The observations represent a random sample of the population.
- Independence: The samples are independently selected. A sample can be assumed independent if the sample size [latex]n<\frac{1}{10}[/latex]
- Sample size: [latex]n\hat{p} \geq 10[/latex] and [latex]n
(1-\hat{p}) \geq 10[/latex]
- The formula to determine the minimum sample size needed to produce a given margin of error is [latex]n = \hat{p}(1-\hat{p})(\frac{z^{*}}{ME})^{2}[/latex]
- Using the conservative [latex]\hat{p} = 0.5[/latex] approach always yields a larger than necessary sample size.
- There are two different methods for calculating confidence intervals for the difference in proportions between two populations.
- If the two groups are independent, the sample for one group is drawn independently of the other group. Knowing the observations of one group does not provide useful information about the other sample. Additionally, the groups can be different sizes.
- If the two groups are dependent (also known as paired or matched pairs), the samples for the two groups are not drawn independently of one another. Knowing the observations of one group does provide useful information about the other sample. Additionally, both groups must be the same size.
- When our goal is to estimate a difference between two population proportions (or the size of a treatment effect), we select two independent random samples and use the difference in sample proportions as an estimate.
- The three conditions when calculating confidence intervals for the difference between two proportions are:
- Random samples: The observations represent a random sample of the population.
- Independence: The samples are independently selected. The population groups are independent (not necessary for matched pairs)
- Sample size: [latex]n_{1}\hat{p}_{1} \geq 10[/latex] and [latex]n_{2}\hat{p}_{2} \geq 10[/latex]
- When certain conditions apply (more on those later), the sampling distribution tells us three things about the distribution of [latex]\hat{p}_{1} - \hat{p}_{2}[/latex]:
- For large samples, the distribution is normal.
- The distribution has a mean of [latex]\hat{p}_{1} - \hat{p}_{2}[/latex], the true population difference.
- The distribution has a standard deviation of [latex]\sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}[/latex].
- The formula for the confidence interval for a difference between two populations is:
Estimate [latex]\pm[/latex] Margin of Error
[latex](\hat{p}_1 - \hat{p}_2) \pm z^{*} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_{1}} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_{2}}}[/latex]
Key Equations
confidence interval for a population proportion
[latex]\hat{p} \pm z^{*} \cdot\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}[/latex]
confidence intervals for the difference in proportions
[latex](\hat{p}_1 - \hat{p}_2) \pm z^{*} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_{1}} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_{2}}}[/latex]
margin of error
([latex]ME[/latex]) is [latex]ME = z^{*} \cdot (\text{standard error})[/latex]
sample size needed for proportion
[latex]n = \hat{p}(1-\hat{p})(\frac{z^{*}}{ME})^{2}[/latex]
if [latex]\hat{p}[/latex] is not given, use [latex]\hat{p}=0.5[/latex]
standard error
[latex]\sqrt{\frac{\hat{p}(1-\hat{p})}{n}}[/latex]
minimum sample size needed to produce a given margin of error
[latex]n = \hat{p}(1-\hat{p})(\frac{z^{*}}{ME})^{2}[/latex]
standard deviation for the difference in proportions
[latex]\sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}[/latex]
Glossary
confidence interval for a population proportion
reasonable range of values where we expect the population proportion to fall within, with a chosen degree of confidence
confidence level
how much confidence we have in the method used to construct the interval
margin of error (ME)
determines width of a confidence interval
point estimate
a single value based on representative sample data that is a plausible estimate of the population parameter