- Complete a chi-square test of independence
- Write the conclusion of a chi-square test of independence in context of the problem
The difference between the chi-square test of homogeneity and the chi-square test of independence is subtle. They differ primarily in study design.
- In the test of independence, we select individuals at random from a population and record data for two categorical variables. The null hypothesis says that the variables are independent.
- In the test of homogeneity, we select random samples from each subgroup or population separately and collect data on a single categorical variable. The null hypothesis says that the distribution of the categorical variable is the same for each subgroup or population.
Let’s take a closer look at it through an example!
The Pew Research Center[1] is a non-partisan fact tank that conducts polls and social science research. One survey that they conduct periodically is called the Core Trends Survey, which measures variables of a wide variety for a representative sample of American adults, including demographic information and information on Internet and social media use.
Two of the variables included in the survey are Education level and Income level.
The observed counts from the 2019 Core Trends Survey for these two variables are displayed in the following two-way table[2]. We’ve seen two-way tables (also called contingency tables) before in a couple of contexts. Previously, we saw contingency tables that displayed values for one categorical variable for samples from multiple populations. In this situation, the two-way table classifies counts for a sample of individuals from one population on two categorical variables.
| Income level | |||||
| <$30,000 | $30,000–$74,999 | $75,000 and up | Total | ||
| Education level | Post-Grad Degree | 2 | 8 | 46 | 56 |
| College Degree | 39 | 113 | 202 | 354 | |
| Some College | 131 | 138 | 120 | 389 | |
| HS Grad | 175 | 129 | 65 | 369 | |
| No HS Degree | 78 | 32 | 8 | 118 | |
| Total | 425 | 420 | 441 | 1,286 |
Since we have two categorical variables measured for the same sample of individuals, the natural question to ask is, “Are these two variables independent?” In other words, “Is income level independent of education level?” We address this question using the chi-square test of independence.
If the two variables, Income level and Education level, are independent, knowing one’s education level should not change the probability that they will have a particular income level, so the distribution of Income level should be the same for every education level. Similarly, the distribution of Education level should be the same for every income level.
This should be feeling fairly reminiscent of the chi-square test of homogeneity, but it is different in a couple of important ways. The homogeneity test considered one categorical variable measured for samples from different populations and asked whether the distribution of that one variable was the same among the populations. In this case, we have one sample from one population of individuals for which two categorical variables are measured, and we’re asking whether those two variables are independent.