Measures of Variability: Learn It 3

  • Describe the differences in variability in histograms and dotplots.
  • Calculate and describe standard deviation.

Standard Deviation

In statistics, we are particularly interested in understanding how data are distributed and where each observation is in reference to the mean. Previously, we have learned to use the deviations of observed values from the mean. What if we want to have one value to represents all of the deviations?

This measurement of variability is called standard deviation, which tells us how spread out observations are from the mean. The notation we use to denote standard deviation differs depending on whether we are discussing a sample or a population. We use the Greek letter [latex]\sigma[/latex] (sigma) to denote the standard deviation of a population of observations. We use the Latin letter [latex]s[/latex] to denote the standard deviation of a sample of observations.

standard deviation

Standard deviation is the “typical” distance of each data point to the mean of the data set.

The following formulas are used to calculate the standard deviation of a population and a sample:

Standard deviation of a population: [latex]\sigma = \sqrt{\dfrac{\sum \left(x-\mu\right)^2}{n}}[/latex], where [latex]\mu[/latex] represents the population mean.

Standard deviation of a sample: [latex]s=\sqrt{\dfrac{\sum \left(x-\bar{x}\right)^2}{n-1}}[/latex], where [latex]\bar{x}[/latex] represents the sample mean.

Standard Deviation

  • The standard deviation is a measure of spread.
  • The value of standard deviation is always positive or zero.
  • The standard deviation is approximately the average distance of the data from the mean.
  • Mean ± SD gives a range of “typical” values 

Note: The [latex]n-1[/latex] value in the standard deviation equation for the sample is called degree of freedom ([latex]df[/latex]). Degrees of freedom refer to the maximum number of independent values. That is, the maximum number of data values that have the freedom to vary in the sample.

Within a data set, some data values can be chosen at random. However, if the data set have a set requirement, such as the data set must add up to a specific sum or mean/average, one of those value within the data set is restricted so that we can meet the set requirement.

Thus, [latex]df = n-1[/latex].

Recall the example about the amount of damage (in millions of dollars) done by [latex]30[/latex] most expensive hurricanes to hit the U.S. mainland between 1990 and 2010.

The mean cost for the [latex]30[/latex] hurricanes is [latex]13,620[/latex] millions dollars.

The standard deviation would be:

[latex]s=\sqrt{\dfrac{(105,840-13,620)^2 + (45,561-13,620)^2 + ... + (11,227-13,620)^2}{30-1}}[/latex]
[latex]s= \sqrt{\dfrac{( 92,220)^2 + (31,941)^2 + ... + (-2,393)^2}{29}}[/latex]

[latex]s= 19,491[/latex] million dollars.

Interpretation of the standard deviation: The standard deviation of [latex]19,491[/latex] million dollars tells us that, on average, the cost of the hurricanes deviates from the mean cost of [latex]13,620[/latex] million dollars by this amount.

This is a substantial standard deviation, indicating that there is a wide variation in the costs of the hurricanes. Some hurricanes caused much more damage than the average, and some caused much less, as indicated by the difference in their costs from the mean value.

When it comes to calculating the standard deviation, especially for large data sets or when dealing with significant figures, leveraging technology is key. Tools such as statistical software [Click to open in a new tab] or a calculator can perform these calculations quickly and reduce the possibility of human error. Once we obtain the standard deviation using these technological aids, our focus should shift to interpreting what this value tells us about the variability of our data.