Confidence Intervals for the Difference in Population Proportions: Learn It 3

  • Calculate a confidence interval for the difference in proportions of two groups.
  • Make conclusions based on a confidence interval.

Difference in Proportions

Our primary goal is to use the data to examine whether there’s a difference in the proportion of callbacks for applications the researchers identified as being perceived as female and the proportion of callbacks for applications the researchers identified as being perceived as male.

It is not feasible to simulate every possible sample to derive an exact sampling distribution in this scenario. Instead, we can use mathematical theory to derive expressions for the mean and standard deviation of the sampling distribution for the difference in proportions.

Sampling Distribution for the Difference in Proportions

When certain conditions apply, the sampling distribution tells us three things about the distribution of [latex]\hat{p}_{1} - \hat{p}_{2}[/latex]:

  1. For large samples, the distribution is normal.
  2. The distribution has a mean of [latex]\hat{p}_{1} - \hat{p}_{2}[/latex].
  3. The distribution has a standard deviation of

[latex]\sqrt{\frac{p_{1}(1-p_{1})}{n_{1}} + \frac{p_{2}(1-p_{2})}{n_{2}}}[/latex].

Similar to previous calculations, we will replace [latex]p_{1}[/latex] and [latex]p_{2}[/latex] in the formula with the respective sample proportions [latex]\hat{p}_{1}[/latex] and [latex]\hat{p}_{2}[/latex]. The estimate is called the standard error. This is the estimate of the sample-to-sample variability, the random variability we expect in [latex]\hat{p}_{1} - \hat{p}_{2}[/latex] if we take random samples of the same size repeatedly.

Now that we have our estimate of the difference in proportions and the standard error, let’s calculate the confidence interval.

Estimate [latex]\pm[/latex] Margin of Error

[latex](\hat{p}_1 - \hat{p}_2) \pm z^{*} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_{1}} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_{2}}}[/latex]

[latex]0.00864 \pm 1.96^{*} \times 0.009[/latex]

[latex]0.00864 \pm 0.01764[/latex]

[latex]0.00864 - 0.01764 = -0.009[/latex] and [latex]0.00864 + 0.01764 = .026[/latex]

This gives us a confidence interval of [latex](-0.009,.026)[/latex] for the true difference in the proportion of applicants with a female-perceived name and a male-perceived name. Notice how [latex]0[/latex] is in the interval? This tells us that there is reasonably no significant difference in the true proportions.

The [latex]z[/latex]* critical value for [latex]90%[/latex], [latex]95%[/latex], and [latex]99%[/latex] are [latex]1.645[/latex], [latex]1.96[/latex], and [latex]2.576[/latex] respectively.

Confidence Interval for Difference in Proportions

Estimate [latex]\pm[/latex] Margin of Error

[latex](\hat{p}_1 - \hat{p}_2) \pm z^{*} \times \sqrt{\frac{\hat{p}_1(1-\hat{p}_1)}{n_{1}} + \frac{\hat{p}_2(1-\hat{p}_2)}{n_{2}}}[/latex]

  • The estimate is the difference in the sample proportions.
  • The margin of error is the width of the confidence interval and is comprised to two parts:
    • [latex]z^{*}[/latex]: The z critical value; this is the point on the standard normal distribution such that the proportion of area under the curve between [latex]-z^{*}[/latex] and [latex]+z^{*}[/latex] is [latex]C[/latex], the confidence level.
    • Standard error: A measure of the sample-to-sample variability.

In practice, we can either use the formula or use technology to calculate the confidence interval. Let’s use technology to calculate the confidence interval.