{"id":1412,"date":"2023-06-22T02:22:54","date_gmt":"2023-06-22T02:22:54","guid":{"rendered":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/chi-square-test-of-homogeneity-learn-it-1\/"},"modified":"2024-05-07T23:12:37","modified_gmt":"2024-05-07T23:12:37","slug":"chi-square-test-of-homogeneity-learn-it-1","status":"publish","type":"chapter","link":"https:\/\/content.one.lumenlearning.com\/introstatstest\/chapter\/chi-square-test-of-homogeneity-learn-it-1\/","title":{"raw":"Chi-Square Test of Homogeneity \u2013 Learn It 1","rendered":"Chi-Square Test of Homogeneity \u2013 Learn It 1"},"content":{"raw":"<section class=\"textbox learningGoals\">\r\n<ul>\r\n\t<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Complete a chi-square test of homogeneity&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:12801,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;arial&quot;,&quot;16&quot;:9}\">Complete a chi-square test of homogeneity<\/span><\/li>\r\n<\/ul>\r\n<\/section>\r\n<section class=\"textbox keyTakeaway\">\r\n<h3>[latex]\\chi^2[\/latex] test of homogeneity<\/h3>\r\n<p>A <strong>chi-square test of homogeneity<\/strong> determines if two or more populations (or subgroups of a population) have the same distribution of a single categorical variable.<\/p>\r\n<p>We use the test of homogeneity if the response variable has two or more categories and we wish to compare two or more populations (or subgroups.)<\/p>\r\n<\/section>\r\n<p>We can answer the following research question with a chi-square test of homogeneity:<\/p>\r\n<ul>\r\n\t<li>Do different top commercial airlines have the same distribution of flight status (whether the flight is on-time, delayed, canceled, or diverted)?<\/li>\r\n<\/ul>\r\n<p>We could compare as many airlines as we like, but let\u2019s look at the top three airlines (by number of passengers)[footnote]List of largest airlines in North America. (2007, June 22). In Wikipedia. https:\/\/en.wikipedia.org\/wiki\/List_of_largest_airlines_in_North_America[\/footnote] and compare their flight status distributions. We can look at this information in a <strong>contingency table <\/strong>(i.e., a <strong>two-way table<\/strong>), where each row represents the flight status distribution of an airline. The following table gives the data for the flights of each airline in March 2021.[footnote]U.S. Department of Transportation, Bureau of Transportation Statistics. (n.d.). On-time performance - Reporting operating carrier flight delays at a glance. https:\/\/www.transtats.bts.gov\/HomeDrillChart_Month.asp?5ry_lrn4=FDFD&amp;N44_Qry=E&amp;5ry_Pn44vr4=DDD&amp;5ry_Nv42146=DDD&amp;heY_fryrp6lrn4=FDFE&amp;heY_fryrp6Z106u=F[\/footnote] Notice that in this table, we are displaying the counts for the categories of one categorical variable (<em>flight status<\/em>) for three different populations (each population is all the flights for a single airline).<\/p>\r\n<table>\r\n<tbody>\r\n<tr>\r\n<td>&nbsp;<\/td>\r\n<td><strong>On-Time Flights<\/strong><\/td>\r\n<td><strong>Delayed Flights<\/strong><\/td>\r\n<td><strong>Canceled Flights<\/strong><\/td>\r\n<td><strong>Diverted Flights<\/strong><\/td>\r\n<td><strong>Total<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>American Airlines<\/strong><\/td>\r\n<td>42,600<\/td>\r\n<td>4,657<\/td>\r\n<td>296<\/td>\r\n<td>95<\/td>\r\n<td>47,648<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Delta Airlines<\/strong><\/td>\r\n<td>51,620<\/td>\r\n<td>4,030<\/td>\r\n<td>150<\/td>\r\n<td>56<\/td>\r\n<td>55,856<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Southwest Airlines<\/strong><\/td>\r\n<td>69,384<\/td>\r\n<td>9,280<\/td>\r\n<td>1,782<\/td>\r\n<td>128<\/td>\r\n<td>80,574<\/td>\r\n<\/tr>\r\n<tr>\r\n<td><strong>Total <\/strong><\/td>\r\n<td>163,604<\/td>\r\n<td>17,967<\/td>\r\n<td>2,228<\/td>\r\n<td>279<\/td>\r\n<td>184,078<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p>Notice that the different airlines have different numbers of flights, so it can be useful to look at the relative frequency distribution for each airline as well (i.e., the proportions of flights that have each status for each airline).<\/p>\r\n<section class=\"textbox example\">\r\n<p class=\"para\"><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\"><b>Relative Frequencies<\/b><\/span><\/p>\r\n<p class=\"para\"><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">Since there were [latex]42,600[\/latex] American Airlines flights that were on time and [latex]47,648[\/latex] American Airlines flights total, the relative frequency (or proportion) of American Airlines flights that were on time is:<\/span><\/p>\r\n<p style=\"text-align: center;\">[latex]\\dfrac{42,600}{47,468}=0.894=89.4\\%[\/latex]<\/p>\r\n<p>In finding a similar proportion for each flight status, we find that the relative frequency distribution for flight status for all three airlines is as displayed in the following table.<\/p>\r\n<div align=\"center\">\r\n<table style=\"width: 99.294%;\">\r\n<tbody>\r\n<tr>\r\n<td style=\"width: 9.17587%;\">\u00a0<\/td>\r\n<td style=\"width: 21.0705%;\"><strong>Percentage On-Time Flights<\/strong><\/td>\r\n<td style=\"width: 19.966%;\"><strong>Percentage Delayed Flights<\/strong><\/td>\r\n<td style=\"width: 19.5412%;\"><strong>Percentage Canceled Flights<\/strong><\/td>\r\n<td style=\"width: 19.2014%;\"><strong>Percentage Diverted Flights<\/strong><\/td>\r\n<td style=\"width: 9.7706%;\"><strong>Total<\/strong><\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 9.17587%;\"><strong>American Airlines<\/strong><\/td>\r\n<td style=\"width: 21.0705%;\">[latex]89.4\\%[\/latex]<\/td>\r\n<td style=\"width: 19.966%;\">[latex]9.8\\%[\/latex]<\/td>\r\n<td style=\"width: 19.5412%;\">[latex]0.6\\%[\/latex]<\/td>\r\n<td style=\"width: 19.2014%;\">[latex]0.2\\%[\/latex]<\/td>\r\n<td style=\"width: 9.7706%;\">\r\n<p>[latex]100\\%[\/latex]<\/p>\r\n<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 9.17587%;\"><strong>Delta Airlines<\/strong><\/td>\r\n<td style=\"width: 21.0705%;\">[latex]92.4\\%[\/latex]<\/td>\r\n<td style=\"width: 19.966%;\">[latex]7.2\\%[\/latex]<\/td>\r\n<td style=\"width: 19.5412%;\">[latex]0.2\\%[\/latex]<\/td>\r\n<td style=\"width: 19.2014%;\">[latex]0.1\\%[\/latex]<\/td>\r\n<td style=\"width: 9.7706%;\">[latex]100\\%[\/latex]<\/td>\r\n<\/tr>\r\n<tr>\r\n<td style=\"width: 9.17587%;\"><strong>Southwest Airlines<\/strong><\/td>\r\n<td style=\"width: 21.0705%;\">[latex]86.1\\%[\/latex]<\/td>\r\n<td style=\"width: 19.966%;\">[latex]11.5\\%[\/latex]<\/td>\r\n<td style=\"width: 19.5412%;\">[latex]2.2\\%[\/latex]<\/td>\r\n<td style=\"width: 19.2014%;\">[latex]0.2\\%[\/latex]<\/td>\r\n<td style=\"width: 9.7706%;\">\r\n<p>[latex]100\\%[\/latex]<\/p>\r\n<\/td>\r\n<\/tr>\r\n<\/tbody>\r\n<\/table>\r\n<p style=\"text-align: left;\">Using only the relative frequencies, do these distributions look significantly different? We'll use a chi-square test of homogeneity to find out.<\/p>\r\n<\/div>\r\n<\/section>\r\n<p>In comparing the flight status distributions for these airlines, we\u2019ll build on two ideas we\u2019ve seen before. We\u2019ve already seen a test for determining whether two population proportions are equal: the two-proportion [latex]z[\/latex]-test. For example, we could think of the March flights as a sample of flights for each airline and consider whether the proportion of on-time flights for all American Airlines flights is the same as the proportion of on-time flights for all Delta Airlines flights.<\/p>\r\n<p>However, in this case, we\u2019re generalizing on that idea by considering more than two populations and looking at the entire distribution of flight status for all values of the categorical variable. Secondly, we\u2019ll be building on the previous activity by using a chi-square test, but instead of comparing a distribution of counts to a theoretical model, we\u2019re comparing distributions of a categorical variable (in this case, <em>flight status<\/em>) among different populations (in this case, there are three populations: all flights for three different airlines).<\/p>\r\n<section class=\"textbox proTip\">The word \u201chomogeneous\u201d means the same or similar, so the <strong>chi-square test of homogeneity<\/strong> is asking whether or not two or more distributions of a categorical variable are the same.In short, a chi-square test of homogeneity compares distributions of one categorical variable for multiple populations.<\/section>\r\n<section class=\"textbox tryIt\">[ohm2_question hide_question_numbers=1 ]2399[\/ohm2_question]<\/section>","rendered":"<section class=\"textbox learningGoals\">\n<ul>\n<li><span data-sheets-value=\"{&quot;1&quot;:2,&quot;2&quot;:&quot;Complete a chi-square test of homogeneity&quot;}\" data-sheets-userformat=\"{&quot;2&quot;:12801,&quot;3&quot;:{&quot;1&quot;:0},&quot;12&quot;:0,&quot;15&quot;:&quot;arial&quot;,&quot;16&quot;:9}\">Complete a chi-square test of homogeneity<\/span><\/li>\n<\/ul>\n<\/section>\n<section class=\"textbox keyTakeaway\">\n<h3>[latex]\\chi^2[\/latex] test of homogeneity<\/h3>\n<p>A <strong>chi-square test of homogeneity<\/strong> determines if two or more populations (or subgroups of a population) have the same distribution of a single categorical variable.<\/p>\n<p>We use the test of homogeneity if the response variable has two or more categories and we wish to compare two or more populations (or subgroups.)<\/p>\n<\/section>\n<p>We can answer the following research question with a chi-square test of homogeneity:<\/p>\n<ul>\n<li>Do different top commercial airlines have the same distribution of flight status (whether the flight is on-time, delayed, canceled, or diverted)?<\/li>\n<\/ul>\n<p>We could compare as many airlines as we like, but let\u2019s look at the top three airlines (by number of passengers)<a class=\"footnote\" title=\"List of largest airlines in North America. (2007, June 22). In Wikipedia. https:\/\/en.wikipedia.org\/wiki\/List_of_largest_airlines_in_North_America\" id=\"return-footnote-1412-1\" href=\"#footnote-1412-1\" aria-label=\"Footnote 1\"><sup class=\"footnote\">[1]<\/sup><\/a> and compare their flight status distributions. We can look at this information in a <strong>contingency table <\/strong>(i.e., a <strong>two-way table<\/strong>), where each row represents the flight status distribution of an airline. The following table gives the data for the flights of each airline in March 2021.<a class=\"footnote\" title=\"U.S. Department of Transportation, Bureau of Transportation Statistics. (n.d.). On-time performance - Reporting operating carrier flight delays at a glance. https:\/\/www.transtats.bts.gov\/HomeDrillChart_Month.asp?5ry_lrn4=FDFD&amp;N44_Qry=E&amp;5ry_Pn44vr4=DDD&amp;5ry_Nv42146=DDD&amp;heY_fryrp6lrn4=FDFE&amp;heY_fryrp6Z106u=F\" id=\"return-footnote-1412-2\" href=\"#footnote-1412-2\" aria-label=\"Footnote 2\"><sup class=\"footnote\">[2]<\/sup><\/a> Notice that in this table, we are displaying the counts for the categories of one categorical variable (<em>flight status<\/em>) for three different populations (each population is all the flights for a single airline).<\/p>\n<table>\n<tbody>\n<tr>\n<td>&nbsp;<\/td>\n<td><strong>On-Time Flights<\/strong><\/td>\n<td><strong>Delayed Flights<\/strong><\/td>\n<td><strong>Canceled Flights<\/strong><\/td>\n<td><strong>Diverted Flights<\/strong><\/td>\n<td><strong>Total<\/strong><\/td>\n<\/tr>\n<tr>\n<td><strong>American Airlines<\/strong><\/td>\n<td>42,600<\/td>\n<td>4,657<\/td>\n<td>296<\/td>\n<td>95<\/td>\n<td>47,648<\/td>\n<\/tr>\n<tr>\n<td><strong>Delta Airlines<\/strong><\/td>\n<td>51,620<\/td>\n<td>4,030<\/td>\n<td>150<\/td>\n<td>56<\/td>\n<td>55,856<\/td>\n<\/tr>\n<tr>\n<td><strong>Southwest Airlines<\/strong><\/td>\n<td>69,384<\/td>\n<td>9,280<\/td>\n<td>1,782<\/td>\n<td>128<\/td>\n<td>80,574<\/td>\n<\/tr>\n<tr>\n<td><strong>Total <\/strong><\/td>\n<td>163,604<\/td>\n<td>17,967<\/td>\n<td>2,228<\/td>\n<td>279<\/td>\n<td>184,078<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p>Notice that the different airlines have different numbers of flights, so it can be useful to look at the relative frequency distribution for each airline as well (i.e., the proportions of flights that have each status for each airline).<\/p>\n<section class=\"textbox example\">\n<p class=\"para\"><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\"><b>Relative Frequencies<\/b><\/span><\/p>\n<p class=\"para\"><span style=\"font-family: 'Public Sans', -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, Oxygen-Sans, Ubuntu, Cantarell, 'Helvetica Neue', sans-serif;\">Since there were [latex]42,600[\/latex] American Airlines flights that were on time and [latex]47,648[\/latex] American Airlines flights total, the relative frequency (or proportion) of American Airlines flights that were on time is:<\/span><\/p>\n<p style=\"text-align: center;\">[latex]\\dfrac{42,600}{47,468}=0.894=89.4\\%[\/latex]<\/p>\n<p>In finding a similar proportion for each flight status, we find that the relative frequency distribution for flight status for all three airlines is as displayed in the following table.<\/p>\n<div style=\"margin: auto;\">\n<table style=\"width: 99.294%;\">\n<tbody>\n<tr>\n<td style=\"width: 9.17587%;\">\u00a0<\/td>\n<td style=\"width: 21.0705%;\"><strong>Percentage On-Time Flights<\/strong><\/td>\n<td style=\"width: 19.966%;\"><strong>Percentage Delayed Flights<\/strong><\/td>\n<td style=\"width: 19.5412%;\"><strong>Percentage Canceled Flights<\/strong><\/td>\n<td style=\"width: 19.2014%;\"><strong>Percentage Diverted Flights<\/strong><\/td>\n<td style=\"width: 9.7706%;\"><strong>Total<\/strong><\/td>\n<\/tr>\n<tr>\n<td style=\"width: 9.17587%;\"><strong>American Airlines<\/strong><\/td>\n<td style=\"width: 21.0705%;\">[latex]89.4\\%[\/latex]<\/td>\n<td style=\"width: 19.966%;\">[latex]9.8\\%[\/latex]<\/td>\n<td style=\"width: 19.5412%;\">[latex]0.6\\%[\/latex]<\/td>\n<td style=\"width: 19.2014%;\">[latex]0.2\\%[\/latex]<\/td>\n<td style=\"width: 9.7706%;\">\n[latex]100\\%[\/latex]\n<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 9.17587%;\"><strong>Delta Airlines<\/strong><\/td>\n<td style=\"width: 21.0705%;\">[latex]92.4\\%[\/latex]<\/td>\n<td style=\"width: 19.966%;\">[latex]7.2\\%[\/latex]<\/td>\n<td style=\"width: 19.5412%;\">[latex]0.2\\%[\/latex]<\/td>\n<td style=\"width: 19.2014%;\">[latex]0.1\\%[\/latex]<\/td>\n<td style=\"width: 9.7706%;\">[latex]100\\%[\/latex]<\/td>\n<\/tr>\n<tr>\n<td style=\"width: 9.17587%;\"><strong>Southwest Airlines<\/strong><\/td>\n<td style=\"width: 21.0705%;\">[latex]86.1\\%[\/latex]<\/td>\n<td style=\"width: 19.966%;\">[latex]11.5\\%[\/latex]<\/td>\n<td style=\"width: 19.5412%;\">[latex]2.2\\%[\/latex]<\/td>\n<td style=\"width: 19.2014%;\">[latex]0.2\\%[\/latex]<\/td>\n<td style=\"width: 9.7706%;\">\n[latex]100\\%[\/latex]\n<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<p style=\"text-align: left;\">Using only the relative frequencies, do these distributions look significantly different? We&#8217;ll use a chi-square test of homogeneity to find out.<\/p>\n<\/div>\n<\/section>\n<p>In comparing the flight status distributions for these airlines, we\u2019ll build on two ideas we\u2019ve seen before. We\u2019ve already seen a test for determining whether two population proportions are equal: the two-proportion [latex]z[\/latex]-test. For example, we could think of the March flights as a sample of flights for each airline and consider whether the proportion of on-time flights for all American Airlines flights is the same as the proportion of on-time flights for all Delta Airlines flights.<\/p>\n<p>However, in this case, we\u2019re generalizing on that idea by considering more than two populations and looking at the entire distribution of flight status for all values of the categorical variable. Secondly, we\u2019ll be building on the previous activity by using a chi-square test, but instead of comparing a distribution of counts to a theoretical model, we\u2019re comparing distributions of a categorical variable (in this case, <em>flight status<\/em>) among different populations (in this case, there are three populations: all flights for three different airlines).<\/p>\n<section class=\"textbox proTip\">The word \u201chomogeneous\u201d means the same or similar, so the <strong>chi-square test of homogeneity<\/strong> is asking whether or not two or more distributions of a categorical variable are the same.In short, a chi-square test of homogeneity compares distributions of one categorical variable for multiple populations.<\/section>\n<section class=\"textbox tryIt\"><iframe loading=\"lazy\" id=\"ohm2399\" class=\"resizable\" src=\"https:\/\/ohm.one.lumenlearning.com\/multiembedq.php?id=2399&theme=lumen&iframe_resize_id=ohm2399&source=tnh\" width=\"100%\" height=\"150\"><\/iframe><\/section>\n<hr class=\"before-footnotes clear\" \/><div class=\"footnotes\"><ol><li id=\"footnote-1412-1\">List of largest airlines in North America. (2007, June 22). In Wikipedia. https:\/\/en.wikipedia.org\/wiki\/List_of_largest_airlines_in_North_America <a href=\"#return-footnote-1412-1\" class=\"return-footnote\" aria-label=\"Return to footnote 1\">&crarr;<\/a><\/li><li id=\"footnote-1412-2\">U.S. Department of Transportation, Bureau of Transportation Statistics. (n.d.). On-time performance - Reporting operating carrier flight delays at a glance. https:\/\/www.transtats.bts.gov\/HomeDrillChart_Month.asp?5ry_lrn4=FDFD&amp;N44_Qry=E&amp;5ry_Pn44vr4=DDD&amp;5ry_Nv42146=DDD&amp;heY_fryrp6lrn4=FDFE&amp;heY_fryrp6Z106u=F <a href=\"#return-footnote-1412-2\" class=\"return-footnote\" aria-label=\"Return to footnote 2\">&crarr;<\/a><\/li><\/ol><\/div>","protected":false},"author":8,"menu_order":19,"template":"","meta":{"_candela_citation":"[]","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"part":1388,"module-header":"learn_it","content_attributions":[],"internal_book_links":[],"video_content":null,"cc_video_embed_content":{"cc_scripts":"","media_targets":[]},"try_it_collection":null,"_links":{"self":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/1412"}],"collection":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/users\/8"}],"version-history":[{"count":12,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/1412\/revisions"}],"predecessor-version":[{"id":6025,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/1412\/revisions\/6025"}],"part":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/parts\/1388"}],"metadata":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapters\/1412\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/media?parent=1412"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/pressbooks\/v2\/chapter-type?post=1412"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/contributor?post=1412"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/introstatstest\/wp-json\/wp\/v2\/license?post=1412"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}