Boxplot Data and Displays: Fresh Take

  • Read information from a boxplot and make conclusions
  • Compare boxplots

Additional Example of Five-Number Summary & Outliers

Recall that when we describe the distribution of a quantitative variable, we describe the overall pattern (shape, center, and spread) in the data and deviations from the pattern (outliers).

A flow chart beginning with Graph the distribution of a quantitative variable. Describe the following: with one arrow pointing to Overall pattern and another arrow pointing to Deviations from the pattern. The overall pattern box points to shape, center, and spread, with the latter being highlighted. The deviations from the pattern box points to outliers.
Figure 1. When analyzing a graph, describe the overall pattern (shape, center, spread) and look for deviations, or outliers, that don’t follow the pattern.
Two sets of exam scores

Consider the following two distributions of exam scores:

A boxplot graphic comparing exam scores for two classes, A and B. Each boxplot shows five number summaries. For class A, the minimum score is approximately 40, the first quartile is just above 70, the median is near 75, the third quartile is just below 80, and the maximum is around 95. There are two low outliers of 40 and 55 and two upper outliers of 90 and 95. For class B, the minimum is around 40, the first quartile is just above 60, the median is about 75, the third quartile is just below 90, with the maximum score close to 95.
Figure 2. Two boxplots of exam scores.

Both distributions have a median of approximately [latex]74.5[/latex].

[latex]Q1[/latex], [latex]Q3[/latex], and IQR

Now we can develop a way to measure the variability about the median. To do so, we use quartiles. Quartile marks divide the data set into four groups with equal counts.

To find the first and third quartiles ([latex]Q1[/latex] and [latex]Q3[/latex] respectively), first determine the list of values that lie both above and below the median. Then, take the medians of those lists.

A boxplot is constructed from five values: the minimum value, the first quartile, the median, the third quartile, and the maximum value.
This set of five numbers is called the five-number summary. 

A general horizontal boxplot displaying the following features from left to right: lower outliers, minimum, Q1, median, Q3, maximum, and upper outliers. The Interquartile Range (IQR) is shown at the top of the boxplot.
Figure 3. A boxplot creates a visual summary of a data set using five important values: the minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It also shows the outliers in the data.
Two sets of exam scores: Consider the following two distributions of exam scores:

A boxplot graphic comparing exam scores for two classes, A and B. Each boxplot shows five number summaries. For class A, the minimum score is approximately 40, the first quartile is just above 70, the median is near 75, the third quartile is just below 80, and the maximum is around 95. There are two low outliers of 40 and 55 and two upper outliers of 90 and 95. For class B, the minimum is around 40, the first quartile is just above 60, the median is about 75, the third quartile is just below 90, with the maximum score close to 95.
Figure 2. Two boxplots of exam scores.

Using the IQR to Identify Outliers

A value is an outlier when:

  • Upper outlier: The value is greater than [latex]Q3 + (1.5 *[/latex]IQR[latex])[/latex]
  • Lower outlier: The value is less than [latex]Q1 - (1.5 *[/latex]IQR[latex])[/latex]

To make more sense of this rule, let’s look at a visual example.

Two sets of exam scores:

Consider the following two distributions of exam scores:

A boxplot graphic comparing exam scores for two classes, A and B. Each boxplot shows five number summaries. For class A, the minimum score is approximately 40, the first quartile is just above 70, the median is near 75, the third quartile is just below 80, and the maximum is around 95. There are two low outliers of 40 and 55 and two upper outliers of 90 and 95. For class B, the minimum is around 40, the first quartile is just above 60, the median is about 75, the third quartile is just below 90, with the maximum score close to 95.
Figure 2. Two boxplots of exam scores.