Chi-Square Test of Homogeneity – Learn It 1

  • Complete a chi-square test of homogeneity

[latex]\chi^2[/latex] test of homogeneity

A chi-square test of homogeneity determines if two or more populations (or subgroups of a population) have the same distribution of a single categorical variable.

We use the test of homogeneity if the response variable has two or more categories and we wish to compare two or more populations (or subgroups.)

We can answer the following research question with a chi-square test of homogeneity:

  • Do different top commercial airlines have the same distribution of flight status (whether the flight is on-time, delayed, canceled, or diverted)?

We could compare as many airlines as we like, but let’s look at the top three airlines (by number of passengers)[1] and compare their flight status distributions. We can look at this information in a contingency table (i.e., a two-way table), where each row represents the flight status distribution of an airline. The following table gives the data for the flights of each airline in March 2021.[2] Notice that in this table, we are displaying the counts for the categories of one categorical variable (flight status) for three different populations (each population is all the flights for a single airline).

  On-Time Flights Delayed Flights Canceled Flights Diverted Flights Total
American Airlines 42,600 4,657 296 95 47,648
Delta Airlines 51,620 4,030 150 56 55,856
Southwest Airlines 69,384 9,280 1,782 128 80,574
Total 163,604 17,967 2,228 279 184,078

Notice that the different airlines have different numbers of flights, so it can be useful to look at the relative frequency distribution for each airline as well (i.e., the proportions of flights that have each status for each airline).

Relative Frequencies

Since there were [latex]42,600[/latex] American Airlines flights that were on time and [latex]47,648[/latex] American Airlines flights total, the relative frequency (or proportion) of American Airlines flights that were on time is:

[latex]\dfrac{42,600}{47,468}=0.894=89.4\%[/latex]

In finding a similar proportion for each flight status, we find that the relative frequency distribution for flight status for all three airlines is as displayed in the following table.

  Percentage On-Time Flights Percentage Delayed Flights Percentage Canceled Flights Percentage Diverted Flights Total
American Airlines [latex]89.4\%[/latex] [latex]9.8\%[/latex] [latex]0.6\%[/latex] [latex]0.2\%[/latex] [latex]100\%[/latex]
Delta Airlines [latex]92.4\%[/latex] [latex]7.2\%[/latex] [latex]0.2\%[/latex] [latex]0.1\%[/latex] [latex]100\%[/latex]
Southwest Airlines [latex]86.1\%[/latex] [latex]11.5\%[/latex] [latex]2.2\%[/latex] [latex]0.2\%[/latex] [latex]100\%[/latex]

Using only the relative frequencies, do these distributions look significantly different? We’ll use a chi-square test of homogeneity to find out.

In comparing the flight status distributions for these airlines, we’ll build on two ideas we’ve seen before. We’ve already seen a test for determining whether two population proportions are equal: the two-proportion [latex]z[/latex]-test. For example, we could think of the March flights as a sample of flights for each airline and consider whether the proportion of on-time flights for all American Airlines flights is the same as the proportion of on-time flights for all Delta Airlines flights.

However, in this case, we’re generalizing on that idea by considering more than two populations and looking at the entire distribution of flight status for all values of the categorical variable. Secondly, we’ll be building on the previous activity by using a chi-square test, but instead of comparing a distribution of counts to a theoretical model, we’re comparing distributions of a categorical variable (in this case, flight status) among different populations (in this case, there are three populations: all flights for three different airlines).

The word “homogeneous” means the same or similar, so the chi-square test of homogeneity is asking whether or not two or more distributions of a categorical variable are the same.In short, a chi-square test of homogeneity compares distributions of one categorical variable for multiple populations.

  1. List of largest airlines in North America. (2007, June 22). In Wikipedia. https://en.wikipedia.org/wiki/List_of_largest_airlines_in_North_America
  2. U.S. Department of Transportation, Bureau of Transportation Statistics. (n.d.). On-time performance - Reporting operating carrier flight delays at a glance. https://www.transtats.bts.gov/HomeDrillChart_Month.asp?5ry_lrn4=FDFD&N44_Qry=E&5ry_Pn44vr4=DDD&5ry_Nv42146=DDD&heY_fryrp6lrn4=FDFE&heY_fryrp6Z106u=F