Measures of Variability: Learn It 2

  • Describe the differences in variability in histograms and dotplots.
  • Calculate and describe standard deviation.

As stated previously, range only utilizes two observations in the entire data set to measure variability. It is not an ideal measure of spread when used alone.

In constructing a measure of spread about the center (i.e., the average or mean), we want to compute how far a “typical” number is away from the mean.

Calculating Deviation from the Mean

Let’s consider the sample data set [latex]2, 2, 4, 5, 6, 7, 9[/latex].
The mean of this data set is [latex]\stackrel{¯}{x}=\frac{2+2+4+5+6+7+9\text{}}{7}\text{}=\text{}\frac{35}{7}=5[/latex].

Dotplot of data set with the mean marked by vertical blue line
Figure 1. A dotplot with a mean of 5.

Here is a dotplot of this data set with the mean marked by the vertical blue line.

We can see that some data is close to the mean, and some data is further from the mean.

Since we want to see how the data points deviate from the mean, we determine how far each point is from the mean. We compute the difference between each of these values and the mean. These differences are called the deviations.

deviation

In statistics, deviation is a measure of difference between the observed value [latex]x[/latex] of a variable and the mean [latex]\bar{x}[/latex].

Deviation = [latex]x-\bar{x}[/latex]

[latex]x[/latex] Deviation [latex]=(x-\bar{x})[/latex]
[latex]2[/latex] [latex]2 − 5 = −3[/latex]
[latex]2[/latex] [latex]2 − 5 = −3[/latex]
[latex]4[/latex] [latex]4 − 5 = −1[/latex]
[latex]5[/latex] [latex]5 − 5 = 0[/latex]
[latex]6[/latex] [latex]6 − 5 = 1[/latex]
[latex]7[/latex] [latex]7 − 5 = 2[/latex]
[latex]9[/latex] [latex] 9 − 5 = 4[/latex]

When visualized on a dotplot, these differences are viewed as distances between each point and the mean. A negative difference indicates that the data point is to the left of the mean (shown in blue on the graph below). A positive difference indicates that the data point is to the right of the mean (shown in green on the graph below).

Dotplot where negative differences are shown as data points to the left of the mean; positive differences are shown as data points to the right
Figure 2. Dot plot showing distances from the mean. Negative differences (left of the mean) are shown in blue, and positive differences (right of the mean) are shown in green.

Our goal is to develop a single measurement that summarizes a typical distance from the mean.

Now, let’s practice determining the distance of a single data point from the mean, a.k.a., the deviation from the mean: [latex](x-\bar{x})[/latex].

Hurricane aftermath from an aerial camera showing a damaged neighborhood with destroyed houses.
Figure 3. By analyzing the cost of hurricanes like this one, we can measure how far each event deviates from the average, revealing just how severe some storms really are.

Hurricanes cause extensive amounts of damage. Let’s consider the amount of damage in dollars of the [latex]30[/latex] most expensive hurricanes to have hit the U.S. mainland between 1990 and 2010.

Representations of large numbers

Take a moment to consider the units within our data set. In the table presented in the question above, we see hurricane damage in millions of dollars. Look at the last number in the table: [latex]11,227[/latex]. Presumably, that means [latex]11,227[/latex] million dollars.

[latex]$11,227[/latex] millions [latex]=$11,227,000,000=$11.227[/latex] billions

The hurricanes contributing to this data were catastrophic, causing billions of dollars of damage. 

The sign (+ or -) of deviation from the mean

Also, consider the meanings behind the sign (+ or -) of the deviation values you found in the question above. The + and – are important when making inferences regarding the data values.