Confidence Intervals for the Difference in Population Proportions: Learn It 2

  • Calculate a confidence interval for the difference in proportions of two groups.

Independent and dependent samples

Similar to a single proportion, we can calculate a confidence interval to obtain a plausible range of values the true difference in proportions takes, assuming certain conditions hold.

There are two different methods for calculating confidence intervals for the difference in proportions. The method you use depends on whether the two groups are independent or dependent (paired).

If the two groups are independent, the sample for one group is drawn independently of the other group. Knowing the observations of one group does not provide useful information about the other sample. Additionally, the groups can be different sizes. 
If the two groups are dependent (also known as paired), the samples for the two groups are not drawn independently of one another. Knowing the observations of one group does provide useful information about the other sample. Additionally, both groups must be the same size.

In addition to samples’ independence, three more conditions need to be satisfied when calculating confidence intervals for the difference between two proportions.

conditions when calculating confidence intervals

The three conditions when calculating confidence intervals for the difference between two proportions are:

  • Random samples: The observations represent a random sample of the population.
  • Independence: The individual observations within each population are independent. That means, when sampling without replacement, the sample size is less than [latex]10%[/latex] of the population.
  • Sample size: [latex]n_{1}\hat{p}_{1} \geq 10[/latex], [latex]n_{1}(1-\hat{p}_{1})\ge 10[/latex] and [latex]n_{2}\hat{p}_{2} \geq 10[/latex], [latex]n_{2}(1-\hat{p}_{2}) \ge 10[/latex] 
For the job callback example, the researchers randomly assigned names that are commonly associated with particular races and genders. Thus, our sample also satisfied the first condition.
For the third condition, let’s show that we have the necessary sample size.

  • [latex]n_{1}\hat{p}_{1} = 309 \geq 10[/latex] since [latex]309[/latex] applications with female-perceived names received callbacks
  • [latex]n_{2}\hat{p}_{2} = 83 \geq 10[/latex] since [latex]83[/latex] applications with male-perceived names received callbacks

We also check that the number of applications that did not receive callbacks meets the condition:

  • [latex]n_{1}(1-\hat{p}_{1}) = 3746-309 = 3437 \geq 10[/latex]
  • [latex]n_{2}(1-\hat{p}_{2}) = 1124-83 = 1024 \geq 10[/latex]