Boxplot Data and Displays: Learn It 1

  • Read information from a boxplot and make conclusions
  • Compare boxplots

When we describe the distribution of a quantitative variable, we describe the overall pattern (shape, center, and spread) in the data and deviations from the pattern (outliers). A graphical visualization of a data set is very useful in giving us a glimpse into the distribution of the data set. In this section, we are going to focus on boxplots, a graphical representation of a quantitative variable. Boxplots are helpful for visualizing the distribution of a quantitative variable.

Boxplots

boxplot

boxplot is a graphical visualization of a quantitative variable that shows median, spread, skew, and outliers by illustrating a set of numbers (minimum, [latex]Q1[/latex], median, [latex]Q3[/latex], and maximum) called the five-number summary.

A boxplot clearly shows the center of the data set and provides a summary at a glance of the bulk of the data and the presence of outliers.

Image describing the characteristics of a boxplot. From left to right, the image first shows two outliers outside of the boxplot, and then the line marking the minimum. Then it shows Q1, the median, and Q3 with a textbox explaining that the Interquartile Range, or the IQR, is Q3-Q1. Moving to the right there is a line indicating the maximum value, with two outliers outside of that.
Figure 1. A boxplot creates a visual summary of a data set using five important values: the minimum, first quartile (Q1), median, third quartile (Q3), and maximum. It also shows the outliers in the data.

Shape and Center

Boxplots, like histograms and dotplots, can also tell us about the shape of a distribution.

  • Left-skewed: A cluster of data on the right with a tail of data tapering off to the left.
  • Symmetric: A cluster of data where the left and right sides of the distribution closely mirror each other.
  • Right-skewed: A cluster of data on the left with a tail of data tapering off to the right.

We can describe the center of a boxplot’s distribution with the mean and median. Recall the effect that skew has on the relationship between the mean and median in a data set. A right-skewed data set will pull the mean to the right of the median, while a left-skewed data set will pull the mean to the left. We can use visual clues to observe the skew in a boxplot in the same way that we can in a histogram or a dotplot.

The descriptive statistics and graphs below describe the 184 observations of the ages of the best actress/actor winners from movies from the Oscars awards ceremonies.

Descriptive statistics (mean 40, median 38), and a histogram with a tail to the right, and a boxplot with three outliers to the right.
Figure 2. A histogram and boxplot displaying the ages of 184 Oscar-winning actors and actresses.
  1. Do you notice any skew in the histogram of this dataset?
  2. Can you point out the corresponding outliers in the boxplot of the data?
  3. What is the relationship between the mean and median of the data? Is the mean less than, greater than, or roughly similar to the median?
  4. What can you conclude about the shape of the data?
  5. What visual clue in the boxplot led to your conclusion?

Note that the boxplots we have seen presented so far are along a horizontal axis, from left to right. It is also common to see boxplots displayed along a vertical axis, from bottom to top, least to greatest.