Measures of Variability: Learn It 6

  • Describe the differences in variability in histograms and dotplots.
  • Calculate and describe standard deviation.

Deciding Which Measurements to Use

We now have a choice between two measurements of center and spread: We can use the median with the interquartile range, or we can use the mean with the standard deviation. How do we decide which measurements to use?

Our next examples show that the shape of the distribution and the presence of outliers help us answer this question.

This boxplot is a summary of homework scores earned by a student. Notice that the distribution of scores has an outlier. This student has mostly high homework scores with one score of [latex]0[/latex].

Appropriate alternative text can be found in the description below.
Figure 1. Boxplot showing a student’s homework scores, with most scores clustered high and one outlier at 0, indicating a much lower value than the rest.

Here are some observations about the homework data:

  • Five-number summary: Minimum: [latex]0[/latex], [latex]Q1: 82[/latex], median: [latex]84.5[/latex], [latex]Q3: 89[/latex], maximum: [latex]100[/latex]
  • Median = [latex]84.5[/latex]
  • Mean = [latex]81.8[/latex]
  • IQR = [latex]7[/latex]
  • Range = [latex]100[/latex]
  • Standard deviation = [latex]17.6[/latex]

These examples illustrate some general guidelines for choosing numerical summaries:

  • Like the mean, the standard deviation is strongly affected by outliers and skew in the data. Therefore, use the mean and the standard deviation as measures of center and spread only for distributions that are reasonably symmetric with a central peak. When outliers are present, the mean and standard deviation are not a good choice.
  • Use the five-number summary (which includes the median, IQR, and range) for all other cases.

Both of these examples also highlight another important principle: Always plot the data.

We need to use a graph to determine the shape of the distribution. By looking at the shape, we can determine which measures of center and spread best describe the data.

Evaluate Distributions of Quantitative Variables