Chi-Square Test of Independence – Learn It 5

  • Complete a chi-square test of independence
  • Write the conclusion of a chi-square test of independence in context of the problem

We concluded from our hypothesis test that the variables Income level and Education level are not independent, but we do not know how they are associated.

lurking variable

It could be that there is a third variable not included in our study that impacts the values of both of the variables we are considering. Such a variable is called a lurking variable.

Now that you’ve seen both the chi-square test of homogeneity and the chi-square test of independence in action, let’s summarize the difference between the two tests.

Test of Independence for a Two-Way Table

  • In the test of independence, we consider one population and two categorical variables.
  • We learned that two events are independent if [latex]P(A|B) = P(A)[/latex], but we did not pay attention to variability in the sample. With the chi-square test of independence, we have a method for deciding whether our observed [latex]P(A|B)[/latex] is “too far” from our observed [latex]P(A)[/latex] to infer independence in the population.
  • The null hypothesis says the two variables are independent (or not associated). The alternative hypothesis says the two variables are dependent (or associated).
  • To test our hypotheses, we select a single random sample and gather data for two different categorical variables.

Test of Homogeneity for a Two-Way Table

  • In the test of homogeneity, we consider two or more populations (or two or more subgroups of a population) and a single categorical variable.
  • The test of homogeneity expands on the test for a difference in two population proportions that we learned in Inference for Two Proportions by comparing the distribution of the categorical variable across multiple groups or populations.
  • The null hypothesis says that the distribution of proportions for all categories is the same in each group or population. The alternative hypothesis says that the distributions differ.
  • To test our hypotheses, we select a random sample from each population or subgroup independently. We gather data for one categorical variable.