{"id":1676,"date":"2023-04-12T19:54:03","date_gmt":"2023-04-12T19:54:03","guid":{"rendered":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/?post_type=chapter&#038;p=1676"},"modified":"2024-10-18T20:54:15","modified_gmt":"2024-10-18T20:54:15","slug":"numerical-summaries-of-data-learn-it-4","status":"web-only","type":"chapter","link":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/chapter\/numerical-summaries-of-data-learn-it-4\/","title":{"raw":"Numerical Summaries of Data: Learn It 4","rendered":"Numerical Summaries of Data: Learn It 4"},"content":{"raw":"<h2>Boxplots<\/h2>\r\n<p>For visualizing data, there is a graphical representation of a [latex]5[\/latex]-number summary called a <strong>box plot<\/strong>, or box and whisker graph.<\/p>\r\n<section class=\"textbox keyTakeaway\">\r\n<div>\r\n<h3>boxplot<\/h3>\r\n<p>A\u00a0<b>boxplot<\/b> is a graphical visualization of a quantitative variable that shows median, spread, skew, and outliers by illustrating the set of numbers of the five-number summary (minimum, [latex]Q1[\/latex], median, [latex]Q3[\/latex], and maximum).<\/p>\r\n<p>&nbsp;<\/p>\r\n<p>A boxplot clearly shows the center of the data set and provides a summary at a glance of the bulk of the data and the presence of outliers.<\/p>\r\n<p>&nbsp;<\/p>\r\n<center><img class=\"aligncenter wp-image-5531 size-full\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/10\/2022\/10\/28231653\/Screen-Shot-2022-02-11-at-1.09.31-PM.png\" alt=\"Characteristics of a boxplot, showing the interquartile range (IQR), with Q1 at the left end, Q3 at the right end, and the median in the middle. Further to the right of Q3 is the maximum and further to the left of Q1 is the minimum. Beyond each of those are outliers.\" width=\"575\" height=\"319\" \/><\/center>\r\n<p>&nbsp;<\/p>\r\n<p>To create a box plot, a number line is first drawn. A box is drawn from the first quartile to the third quartile, and a line is drawn through the box at the median. \u201cWhiskers\u201d are extended out to the minimum and maximum values.<\/p>\r\n<\/div>\r\n<\/section>\r\n<p>Box plots are particularly useful for comparing data from two populations.<\/p>\r\n<section class=\"textbox example\">The box plot of service times for two fast-food restaurants is shown below.<br \/>\r\n<center><img class=\"alignnone wp-image-12882\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/18\/2023\/04\/12175728\/large-LI4.png\" alt=\"Number line titled Service Time (minutes), in increments of 1 from 0-10. Two box plots are above it. The top one is labeled Store 1. A vertical line indicates 0.7. A horizontal line connects this to the next vertical line, 1.8. This line forms the left side of a rectangle; a line at 2.3 is its right side. The line at 2.3 also serves as the left side of another rectangle, with a line at 2.9 as its right side. This line at 2.9 connects with a horizontal line to a final vertical line at 6.3. The bottom box plot is labeled Store 2. A vertical line indicates 0.5. A horizontal line connects this to the next vertical line, 1.1. This line forms the left side of a rectangle; a line at 2.1 is its right side. The line at 2.1 also serves as the left side of another rectangle, with a line at 5.7 as its right side. This line at 5.7 connects with a horizontal line to a final vertical line at 9.6.\" width=\"500\" height=\"266\" \/><\/center>\r\n<p>&nbsp;<\/p>\r\n<br \/>\r\nWhich store should you go to in a hurry?[reveal-answer q=\"770439\"]Show Solution[\/reveal-answer]<br \/>\r\n[hidden-answer a=\"770439\"]<br \/>\r\nWhile store [latex]2[\/latex] had a slightly shorter median service time ([latex]2.1[\/latex] minutes vs. [latex]2.3[\/latex] minutes), store [latex]2[\/latex] is less consistent, with a wider spread of the data.At store [latex]1[\/latex], [latex]75%[\/latex] of customers were served within [latex]2.9[\/latex] minutes, while at store [latex]2[\/latex], [latex]75%[\/latex] of customers were served within [latex]5.7[\/latex] minutes.<br \/>\r\n[\/hidden-answer]<\/section>\r\n<h3>Interquartile Range (IQR)<\/h3>\r\n<p>The interquartile range (IQR) is a statistical measure used to describe the middle spread of a data set. It represents the range within which the central [latex]50\\%[\/latex] of data lies, by taking the difference between the third quartile ([latex]Q3[\/latex]), which marks the top of the middle [latex]50\\%[\/latex], and the first quartile ([latex]Q1[\/latex]), which marks the bottom of the middle [latex]50\\%[\/latex]. This measurement helps to understand the dispersion of the middle bulk of a data set, providing a clearer picture of its distribution by reducing the influence of outliers.<\/p>\r\n<section class=\"textbox keyTakeaway\">\r\n<div>\r\n<h3>IQR<\/h3>\r\n<p>The\u00a0<strong>interquartile range<\/strong>\u00a0(sometimes denoted as IQR) is the difference between the quartiles calculated as [latex]Q3 \u2013 Q1[\/latex].<\/p>\r\n<\/div>\r\n<\/section>\r\n<p>The IQR represents the range of the middle half of the values in the data set and is often used to describe the typical spread.<\/p>\r\n<section class=\"textbox tryIt\">[ohm2_question hide_question_numbers=1]2088[\/ohm2_question]<\/section>\r\n<p>The IQR can be used to find the limits of the upper and lower outliers.\u00a0<\/p>\r\n<section class=\"textbox questionHelp\">\r\n<p><strong>How To: Calculate the Lower Outlier Limit<\/strong><\/p>\r\n<ol>\r\n\t<li>Lower Limit = [latex]Q1 \u2212 (1.5 \u00d7 IQR)[\/latex]<\/li>\r\n\t<li>Any data point below this limit is considered a lower outlier.<\/li>\r\n<\/ol>\r\n<p><strong>How To: Calculate the Upper Outlier Limit<\/strong><\/p>\r\n<ol>\r\n\t<li>Upper Limit = [latex]Q3 + (1.5 \u00d7 IQR)[\/latex]<\/li>\r\n\t<li>Any data point above this limit is considered an upper outlier.<\/li>\r\n<\/ol>\r\n<\/section>\r\n<p>Boxplots can tell us about the shape of a distribution. The shape of a distribution refers to how data is spread out across the range of values, encompassing characteristics like symmetry, skewness, and the presence of outliers. Skewness specifically describes the degree of asymmetry in the distribution; it's a measure of how much the distribution leans to one side.<\/p>\r\n<section class=\"textbox keyTakeaway\">\r\n<div>\r\n<h3>skew<\/h3>\r\n<ul>\r\n\t<li><strong>Left skewed<\/strong>:\u00a0A cluster of data on the right with a tail of data tapering off to the left.<\/li>\r\n\t<li><strong>Symmetric<\/strong>: A cluster of data where the left and right sides of the distribution <em>closely<\/em>\u00a0<em>mirror<\/em>\u00a0each other.<\/li>\r\n\t<li><strong>Right skewed<\/strong>: A cluster of data on the left with a tail of data tapering off to the right.<\/li>\r\n<\/ul>\r\n<\/div>\r\n<\/section>\r\n<p>For boxplots, how can we describe the center of the distribution? With mean and median, of course! Recall the effect that skew has on the relationship between the mean and median in a data set. A right-skewed data set will pull the mean to the right of the median while a left-skewed data set will pull the mean to the left. We can use visual clues to observe the skew in a boxplot.<\/p>\r\n<section class=\"textbox seeExample\">The descriptive statistics and graphs below describe the [latex]184[\/latex] observations of the ages of the best actress\/actor winners from movies from the Oscars awards ceremonies.<br \/>\r\n<center><img class=\"aligncenter size-full wp-image-1057\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5772\/2022\/02\/12234214\/Skew_OscarsAge_smaller.png\" alt=\"Descriptive statistics (mean 40, median 38), and a histogram with a tail to the right, and a boxplot with three outliers to the right.\" width=\"700\" height=\"551\" \/><\/center>\r\n<p>&nbsp;<\/p>\r\n<ol>\r\n\t<li>Do you notice any skew in the dotplot of this data set?<\/li>\r\n\t<li>Can you point out the corresponding outliers in the boxplot of the data?<\/li>\r\n\t<li>What is the relationship between the mean and median of the data? Is the mean less than, greater than, or roughly similar to the median?<\/li>\r\n\t<li>What can you conclude about the shape of the data?<\/li>\r\n\t<li>What visual clue in the boxplot led to your conclusion?<\/li>\r\n<\/ol>\r\n<p>[reveal-answer q=\"321107\"]Show Solution[\/reveal-answer]<br \/>\r\n[hidden-answer a=\"321107\"]<\/p>\r\n<ol>\r\n\t<li>The dotplot appears to have a pronounced right tail, which would indicate a right skew.<\/li>\r\n\t<li>There are three distinct outliers to the right of the bulk of the data.<\/li>\r\n\t<li>The descriptive statistics give the median as [latex]38[\/latex] and the mean as [latex]40[\/latex]. The mean is greater than the median.<\/li>\r\n\t<li>The data is right skewed. The extreme values greater than the bulk of the data have pulled the mean to the right of the median.<\/li>\r\n\t<li>The skew can be seen in the boxplot by observing outliers only to the right of the bulk of the data, with no corresponding, symmetrical outliers to the left.<\/li>\r\n<\/ol>\r\n<p>[\/hidden-answer]<\/p>\r\n<\/section>\r\n<section class=\"textbox tryIt\">[ohm2_question hide_question_numbers=1]2101[\/ohm2_question]<\/section>","rendered":"<h2>Boxplots<\/h2>\n<p>For visualizing data, there is a graphical representation of a [latex]5[\/latex]-number summary called a <strong>box plot<\/strong>, or box and whisker graph.<\/p>\n<section class=\"textbox keyTakeaway\">\n<div>\n<h3>boxplot<\/h3>\n<p>A\u00a0<b>boxplot<\/b> is a graphical visualization of a quantitative variable that shows median, spread, skew, and outliers by illustrating the set of numbers of the five-number summary (minimum, [latex]Q1[\/latex], median, [latex]Q3[\/latex], and maximum).<\/p>\n<p>&nbsp;<\/p>\n<p>A boxplot clearly shows the center of the data set and provides a summary at a glance of the bulk of the data and the presence of outliers.<\/p>\n<p>&nbsp;<\/p>\n<div style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter wp-image-5531 size-full\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/10\/2022\/10\/28231653\/Screen-Shot-2022-02-11-at-1.09.31-PM.png\" alt=\"Characteristics of a boxplot, showing the interquartile range (IQR), with Q1 at the left end, Q3 at the right end, and the median in the middle. Further to the right of Q3 is the maximum and further to the left of Q1 is the minimum. Beyond each of those are outliers.\" width=\"575\" height=\"319\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>To create a box plot, a number line is first drawn. A box is drawn from the first quartile to the third quartile, and a line is drawn through the box at the median. \u201cWhiskers\u201d are extended out to the minimum and maximum values.<\/p>\n<\/div>\n<\/section>\n<p>Box plots are particularly useful for comparing data from two populations.<\/p>\n<section class=\"textbox example\">The box plot of service times for two fast-food restaurants is shown below.<\/p>\n<div style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"alignnone wp-image-12882\" src=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/18\/2023\/04\/12175728\/large-LI4.png\" alt=\"Number line titled Service Time (minutes), in increments of 1 from 0-10. Two box plots are above it. The top one is labeled Store 1. A vertical line indicates 0.7. A horizontal line connects this to the next vertical line, 1.8. This line forms the left side of a rectangle; a line at 2.3 is its right side. The line at 2.3 also serves as the left side of another rectangle, with a line at 2.9 as its right side. This line at 2.9 connects with a horizontal line to a final vertical line at 6.3. The bottom box plot is labeled Store 2. A vertical line indicates 0.5. A horizontal line connects this to the next vertical line, 1.1. This line forms the left side of a rectangle; a line at 2.1 is its right side. The line at 2.1 also serves as the left side of another rectangle, with a line at 5.7 as its right side. This line at 5.7 connects with a horizontal line to a final vertical line at 9.6.\" width=\"500\" height=\"266\" srcset=\"https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/18\/2023\/04\/12175728\/large-LI4.png 698w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/18\/2023\/04\/12175728\/large-LI4-300x160.png 300w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/18\/2023\/04\/12175728\/large-LI4-65x35.png 65w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/18\/2023\/04\/12175728\/large-LI4-225x120.png 225w, https:\/\/content-cdn.one.lumenlearning.com\/wp-content\/uploads\/sites\/18\/2023\/04\/12175728\/large-LI4-350x187.png 350w\" sizes=\"(max-width: 500px) 100vw, 500px\" \/><\/div>\n<p>&nbsp;<\/p>\n<p>\nWhich store should you go to in a hurry?<\/p>\n<div class=\"qa-wrapper\" style=\"display: block\"><button class=\"show-answer show-answer-button collapsed\" data-target=\"q770439\">Show Solution<\/button><\/p>\n<div id=\"q770439\" class=\"hidden-answer\" style=\"display: none\">\nWhile store [latex]2[\/latex] had a slightly shorter median service time ([latex]2.1[\/latex] minutes vs. [latex]2.3[\/latex] minutes), store [latex]2[\/latex] is less consistent, with a wider spread of the data.At store [latex]1[\/latex], [latex]75%[\/latex] of customers were served within [latex]2.9[\/latex] minutes, while at store [latex]2[\/latex], [latex]75%[\/latex] of customers were served within [latex]5.7[\/latex] minutes.\n<\/div>\n<\/div>\n<\/section>\n<h3>Interquartile Range (IQR)<\/h3>\n<p>The interquartile range (IQR) is a statistical measure used to describe the middle spread of a data set. It represents the range within which the central [latex]50\\%[\/latex] of data lies, by taking the difference between the third quartile ([latex]Q3[\/latex]), which marks the top of the middle [latex]50\\%[\/latex], and the first quartile ([latex]Q1[\/latex]), which marks the bottom of the middle [latex]50\\%[\/latex]. This measurement helps to understand the dispersion of the middle bulk of a data set, providing a clearer picture of its distribution by reducing the influence of outliers.<\/p>\n<section class=\"textbox keyTakeaway\">\n<div>\n<h3>IQR<\/h3>\n<p>The\u00a0<strong>interquartile range<\/strong>\u00a0(sometimes denoted as IQR) is the difference between the quartiles calculated as [latex]Q3 \u2013 Q1[\/latex].<\/p>\n<\/div>\n<\/section>\n<p>The IQR represents the range of the middle half of the values in the data set and is often used to describe the typical spread.<\/p>\n<section class=\"textbox tryIt\"><iframe loading=\"lazy\" id=\"ohm2088\" class=\"resizable\" src=\"https:\/\/ohm.one.lumenlearning.com\/multiembedq.php?id=2088&theme=lumen&iframe_resize_id=ohm2088&source=tnh\" width=\"100%\" height=\"150\"><\/iframe><\/section>\n<p>The IQR can be used to find the limits of the upper and lower outliers.\u00a0<\/p>\n<section class=\"textbox questionHelp\">\n<p><strong>How To: Calculate the Lower Outlier Limit<\/strong><\/p>\n<ol>\n<li>Lower Limit = [latex]Q1 \u2212 (1.5 \u00d7 IQR)[\/latex]<\/li>\n<li>Any data point below this limit is considered a lower outlier.<\/li>\n<\/ol>\n<p><strong>How To: Calculate the Upper Outlier Limit<\/strong><\/p>\n<ol>\n<li>Upper Limit = [latex]Q3 + (1.5 \u00d7 IQR)[\/latex]<\/li>\n<li>Any data point above this limit is considered an upper outlier.<\/li>\n<\/ol>\n<\/section>\n<p>Boxplots can tell us about the shape of a distribution. The shape of a distribution refers to how data is spread out across the range of values, encompassing characteristics like symmetry, skewness, and the presence of outliers. Skewness specifically describes the degree of asymmetry in the distribution; it&#8217;s a measure of how much the distribution leans to one side.<\/p>\n<section class=\"textbox keyTakeaway\">\n<div>\n<h3>skew<\/h3>\n<ul>\n<li><strong>Left skewed<\/strong>:\u00a0A cluster of data on the right with a tail of data tapering off to the left.<\/li>\n<li><strong>Symmetric<\/strong>: A cluster of data where the left and right sides of the distribution <em>closely<\/em>\u00a0<em>mirror<\/em>\u00a0each other.<\/li>\n<li><strong>Right skewed<\/strong>: A cluster of data on the left with a tail of data tapering off to the right.<\/li>\n<\/ul>\n<\/div>\n<\/section>\n<p>For boxplots, how can we describe the center of the distribution? With mean and median, of course! Recall the effect that skew has on the relationship between the mean and median in a data set. A right-skewed data set will pull the mean to the right of the median while a left-skewed data set will pull the mean to the left. We can use visual clues to observe the skew in a boxplot.<\/p>\n<section class=\"textbox seeExample\">The descriptive statistics and graphs below describe the [latex]184[\/latex] observations of the ages of the best actress\/actor winners from movies from the Oscars awards ceremonies.<\/p>\n<div style=\"text-align: center;\"><img loading=\"lazy\" decoding=\"async\" class=\"aligncenter size-full wp-image-1057\" src=\"https:\/\/s3-us-west-2.amazonaws.com\/courses-images\/wp-content\/uploads\/sites\/5772\/2022\/02\/12234214\/Skew_OscarsAge_smaller.png\" alt=\"Descriptive statistics (mean 40, median 38), and a histogram with a tail to the right, and a boxplot with three outliers to the right.\" width=\"700\" height=\"551\" \/><\/div>\n<p>&nbsp;<\/p>\n<ol>\n<li>Do you notice any skew in the dotplot of this data set?<\/li>\n<li>Can you point out the corresponding outliers in the boxplot of the data?<\/li>\n<li>What is the relationship between the mean and median of the data? Is the mean less than, greater than, or roughly similar to the median?<\/li>\n<li>What can you conclude about the shape of the data?<\/li>\n<li>What visual clue in the boxplot led to your conclusion?<\/li>\n<\/ol>\n<p><div class=\"qa-wrapper\" style=\"display: block\"><button class=\"show-answer show-answer-button collapsed\" data-target=\"q321107\">Show Solution<\/button><\/p>\n<div id=\"q321107\" class=\"hidden-answer\" style=\"display: none\">\n<ol>\n<li>The dotplot appears to have a pronounced right tail, which would indicate a right skew.<\/li>\n<li>There are three distinct outliers to the right of the bulk of the data.<\/li>\n<li>The descriptive statistics give the median as [latex]38[\/latex] and the mean as [latex]40[\/latex]. The mean is greater than the median.<\/li>\n<li>The data is right skewed. The extreme values greater than the bulk of the data have pulled the mean to the right of the median.<\/li>\n<li>The skew can be seen in the boxplot by observing outliers only to the right of the bulk of the data, with no corresponding, symmetrical outliers to the left.<\/li>\n<\/ol>\n<\/div>\n<\/div>\n<\/section>\n<section class=\"textbox tryIt\"><iframe loading=\"lazy\" id=\"ohm2101\" class=\"resizable\" src=\"https:\/\/ohm.one.lumenlearning.com\/multiembedq.php?id=2101&theme=lumen&iframe_resize_id=ohm2101&source=tnh\" width=\"100%\" height=\"150\"><\/iframe><\/section>\n","protected":false},"author":15,"menu_order":14,"template":"","meta":{"_candela_citation":"[]","pb_show_title":"on","pb_short_title":"","pb_subtitle":"","pb_authors":[],"pb_section_license":""},"chapter-type":[],"contributor":[],"license":[],"part":1572,"module-header":"learn_it","content_attributions":[],"internal_book_links":[],"video_content":null,"cc_video_embed_content":{"cc_scripts":"","media_targets":[]},"try_it_collection":null,"_links":{"self":[{"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/pressbooks\/v2\/chapters\/1676"}],"collection":[{"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/pressbooks\/v2\/chapters"}],"about":[{"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/wp\/v2\/types\/chapter"}],"author":[{"embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/wp\/v2\/users\/15"}],"version-history":[{"count":25,"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/pressbooks\/v2\/chapters\/1676\/revisions"}],"predecessor-version":[{"id":13544,"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/pressbooks\/v2\/chapters\/1676\/revisions\/13544"}],"part":[{"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/pressbooks\/v2\/parts\/1572"}],"metadata":[{"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/pressbooks\/v2\/chapters\/1676\/metadata\/"}],"wp:attachment":[{"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/wp\/v2\/media?parent=1676"}],"wp:term":[{"taxonomy":"chapter-type","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/pressbooks\/v2\/chapter-type?post=1676"},{"taxonomy":"contributor","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/wp\/v2\/contributor?post=1676"},{"taxonomy":"license","embeddable":true,"href":"https:\/\/content.one.lumenlearning.com\/quantitativereasoning\/wp-json\/wp\/v2\/license?post=1676"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}