- Complete a chi-square test of homogeneity
Expected Counts
As with the goodness of fit test, we need to find the expected count for each group. In this case, our null hypothesis is that the distribution of proportions of flight status for each airline is the same. We estimate those proportions by looking at the overall proportions for each flight status among all three airlines.
| On-Time Flights | Delayed Flights | Canceled Flights | Diverted Flights | Total | |
| Total | 163,604 | 17,967 | 2,228 | 279 | 184,078 |
With on-time flights, there were [latex]163,604[/latex] on-time flights total out of the total [latex]184,708[/latex] flights for all three airlines we’re considering. So, the estimated on-time proportion for all three airlines is:
[latex]\dfrac{163,604}{184,078}=0.88877541 \approx 88.9\%[/latex]
If the distributions are the same, or homogeneous (i.e., if the null hypothesis is true), we would expect each airline to have about [latex]88.9\%[/latex] of its flights be on time. American Airlines, for example, would have an expected count of on-time flights that is about [latex]88.9\%[/latex] of the total [latex]47,648[/latex] flights that American Airlines flew in March 2021.
In this case, we will use more decimal places in our calculation to avoid rounding errors:
[latex]0.88877541*47,648=42,348.4[/latex]
Note that since the expected counts are theoretical values, they do not need to be whole numbers.
expected counts
We can calculate the expected count of any cell in a two-way frequency table by calculating:
[latex]\frac{\text{row total}\times \text{column total}}{\text{total}}[/latex]
Notice that once we had the expected on-time flight counts for American Airlines and Delta Airlines, we could have just subtracted those from the total number of on-time flights in order to find the expected count for Southwest Airlines. (Try this out to check your answer!) The same goes for each column: once we have two of the expected counts, we can find the third by subtracting. Similarly, in each row, once we have three of the expected counts, we can find the fourth by subtracting. This gives us [latex]2*3=6[/latex] degrees of freedom for our chi-square test of homogeneity. (In other words, once we have filled in six cells in the table of expected counts, we can fill in the others by subtracting.)
degree of freedom ([latex]df[/latex])
In general, if the two-way table for a homogeneity test has [latex]C[/latex] columns and [latex]R[/latex] rows, then there are [latex](R-1)(C-1)[/latex] degrees of freedom.
Notice that in our example, there are three rows (representing airlines) and four columns (representing flight status), which gives [latex](3-1)(4-1)=2*3=6[/latex] degrees of freedom, as we saw previously.