Boxplot Data and Displays: Learn It 3

  • Read information from a boxplot and make conclusions
  • Compare boxplots

Interquartile Range and Outliers

Interquartile Range (IQR)

IQR

The interquartile range (sometimes denoted as IQR) is the difference between the quartiles calculated as [latex]Q3 – Q1[/latex].

The IQR represents the range of the middle half of the values in the data set and is often used to describe the typical spread.

Outliers

Some outliers seem obvious to spot (such as the GDP per capita of the United States), but others are harder to identify (such as Japan’s GDP per capita). In statistics, there is a rule for testing whether a value is “unusual enough” to be called an outlier.

The IQR Method for Outliers

A value is suspected to be a potential outlier if it is less than [latex]1.5\times[/latex]IQR below the first quartile or more than [latex]1.5\times[/latex]IQR above the third quartile.

That is:

  • upper outlier as an observation that is greater than [latex]Q3 +[/latex]([latex]1.5\times[/latex]IQR); and
  • lower outlier as an observation that is less than [latex]Q1 -[/latex]([latex]1.5\times[/latex]IQR).

It’s important to note that while this method can be used to identify unusual observations in skewed distributions, other methods are well-suited for symmetrical distributions. You’ll learn about these other methods in an upcoming section.

In certain applications, it may be desirable to distinguish between “mild outliers” (using [latex]1.5 \times[/latex])IQR and “extreme outliers” (using [latex]3 \times[/latex])IQR. We can really set the threshold for “unusual” values as far away as we’d like, depending on the application.

Recall that Japan’s GDP per capita from the data set is $[latex]39,287[/latex]. We would like to know how unusual this value really is in comparison to the rest of the data values.
We’ll use the IQR method to make the determination. Under this method, a data value is considered an outlier if it lies ([latex]1.5[/latex][latex]\times[/latex]IQR) above [latex]Q3[/latex] or below [latex]Q1[/latex].
Since [latex]39,287[/latex] is greater than the median, we’ll test it to see if it exceeds [latex]Q3 +[/latex]([latex]1.5[/latex] [latex]\times[/latex]IQR).
(If it were a very small number, we’d test to see if it were lower than [latex]Q1 -[/latex]([latex]1.5[/latex] [latex]\times[/latex] IQR).).Recall, for this data set: [latex]Q3 =[/latex] [latex]11,289[/latex] and IQR[latex]= 9,273[/latex].

Step 1) Calculate ([latex]1.5[/latex][latex]\times[/latex]IQR).

Step 2) Calculate [latex]Q3 +[/latex]([latex]1.5[/latex] [latex]\times[/latex]IQR)

Step 3) Compare Japan’s GDP per capita. If it exceeds [latex]Q3 +[/latex]([latex]1.5[/latex][latex]\times[/latex]IQR), then it is an outlier.

What did you discover? Is Japan’s GDP per capita an outlier in the data set?