- Complete a chi-square test of independence
- Write the conclusion of a chi-square test of independence in context of the problem
Conditional and Marginal Distributions
The Pew Research Center is a non-partisan, social science research think tank. One of the surveys they conduct periodically is called the Core Trends Survey, in which they poll a representative sample of American adults on a multitude of variables. The contingency table detailing the observed counts for the variables Number of books read in the last year and Type of residence is given below.[1]
| Type of residence | |||||
| Urban | Suburban | Rural | Total | ||
| Number of books read | None | 133 | 144 | 81 | 358 |
| 1–4 | 146 | 149 | 53 | 348 | |
| 5–9 | 76 | 74 | 33 | 183 | |
| 10+ | 194 | 216 | 76 | 486 | |
| Total | 549 | 583 | 243 | 1,375 |
conditional distribution
The conditional distribution of one variable with respect to a value of a second variable gives the counts or the relative frequencies of the first variable restricted to only that value of the second variable. In terms of the table, this means we will restrict ourselves to either one row or one column of the interior part of the table.
| Urban | ||
| Number of books read | None | 133 |
| 1–4 | 146 | |
| 5–9 | 76 | |
| 10+ | 194 | |
| Total | 549 |
Often, when we discuss the conditional distribution, we’re more interested in the relative frequencies, or the proportion corresponding to each value of the variable of interest.
For example, among all the people living in an urban setting, the relative frequency of individuals who read no books in the last year is:
[latex]\dfrac{133}{549}=0.2423=24.23\%[/latex]
marginal distribution
The marginal distribution of a variable gives the distribution of one of the variables with no regard to the other variable whatsoever. In the table, this will be either the total row or the total column. One way to remember this is that the “margins” are on the outsides of a piece of paper (sides, top, and bottom), and the total row and column are the outside row and column of the table (on the side and bottom).
| Number of books read | None | 358 |
| 1–4 | 348 | |
| 5–9 | 183 | |
| 10+ | 486 | |
| Total | 1,375 |
As before, we are often interested in the relative frequencies of the marginal distribution. For example, the relative frequency of individuals who read no books last year is:
[latex]\dfrac{358}{1375}=0.2604=26.04\%[/latex]
Note: Sometimes the percentages will not sum exactly to [latex]100\%[/latex]. This is due to a rounding error when you compute and round each percentage.
In the chi-square test of independence, we will be considering whether two variables are independent or not.
For example, if our two variables are independent, then knowing that someone lives in an urban area should not affect the probability that they fall into any one category of Number of books read in the last year.
Consider the following contingency table again. If knowing the Type of residence should not affect the likelihood of Number of books read in the last year, each column in our contingency table should have approximately the same distribution of Number of books read in the last year. In other words, the conditional distribution of Number of books read in the last year for each value of Type of residence should match the marginal distribution of Number of books read in the last year.
For example, the relative frequencies for the conditional distribution of Number of books read in the last year for urban dwellers should match the marginal distribution you found in Question 2. The relative frequencies of Number of books read in the last year for rural dwellers should also match that marginal distribution.
| Type of residence | |||||
| Urban | Suburban | Rural | Total | ||
| Number of books read | None | 133 | 144 | 81 | 358 |
| 1–4 | 146 | 149 | 53 | 348 | |
| 5–9 | 76 | 74 | 33 | 183 | |
| 10+ | 194 | 216 | 76 | 486 | |
| Total | 549 | 583 | 243 | 1,375 |
Let’s look again at the marginal distribution for the number of books read, but this time, we’ll include more decimal places so we can avoid rounding errors in our next calculation.
| Relative frequency of number of books read as a percentage | None | 0.26036364 |
| 1–4 | 0.25309091 | |
| 5–9 | 0.13309091 | |
| 10+ | 0.35345455 | |
| Total | 1 |
- Pew Research Center. (2019). Core trends survey - Mobile technology and home broadband 2019. https://www.pewresearch.org/internet/dataset/core-trends-survey/ ↵