- Read information from a boxplot and make conclusions
- Compare boxplots
Interquartile Range and Outliers
Interquartile Range (IQR)
IQR
The interquartile range (sometimes denoted as IQR) is the difference between the quartiles calculated as [latex]Q3 – Q1[/latex].
The IQR represents the range of the middle half of the values in the data set and is often used to describe the typical spread.
Outliers
Some outliers seem obvious to spot (such as the GDP per capita of the United States), but others are harder to identify (such as Japan’s GDP per capita). In statistics, there is a rule for testing whether a value is “unusual enough” to be called an outlier.
The IQR Method for Outliers
A value is suspected to be a potential outlier if it is less than [latex]1.5\times[/latex]IQR below the first quartile or more than [latex]1.5\times[/latex]IQR above the third quartile.
That is:
- upper outlier as an observation that is greater than [latex]Q3 +[/latex]([latex]1.5\times[/latex]IQR); and
- lower outlier as an observation that is less than [latex]Q1 -[/latex]([latex]1.5\times[/latex]IQR).
It’s important to note that while this method can be used to identify unusual observations in skewed distributions, other methods are well-suited for symmetrical distributions. You’ll learn about these other methods in an upcoming section.
In certain applications, it may be desirable to distinguish between “mild outliers” (using [latex]1.5 \times[/latex])IQR and “extreme outliers” (using [latex]3 \times[/latex])IQR. We can really set the threshold for “unusual” values as far away as we’d like, depending on the application.
We’ll use the IQR method to make the determination. Under this method, a data value is considered an outlier if it lies ([latex]1.5[/latex][latex]\times[/latex]IQR) above [latex]Q3[/latex] or below [latex]Q1[/latex].
Since [latex]39,287[/latex] is greater than the median, we’ll test it to see if it exceeds [latex]Q3 +[/latex]([latex]1.5[/latex] [latex]\times[/latex]IQR).
(If it were a very small number, we’d test to see if it were lower than [latex]Q1 -[/latex]([latex]1.5[/latex] [latex]\times[/latex] IQR).).Recall, for this data set: [latex]Q3 =[/latex] [latex]11,289[/latex] and IQR[latex]= 9,273[/latex].
Step 1) Calculate ([latex]1.5[/latex][latex]\times[/latex]IQR).
Step 2) Calculate [latex]Q3 +[/latex]([latex]1.5[/latex] [latex]\times[/latex]IQR)
Step 3) Compare Japan’s GDP per capita. If it exceeds [latex]Q3 +[/latex]([latex]1.5[/latex][latex]\times[/latex]IQR), then it is an outlier.
What did you discover? Is Japan’s GDP per capita an outlier in the data set?