Numerical Summaries of Data: Fresh Take

  • Find the average, middle value, and most common value in a set of data
  • Calculate how spread out the data is using the range and standard deviation
  • Identify the parts of a five-number summary for a set of data and create a box plot

Mean, Median, and Mode

The Main Idea

Mean, median, and mode are three types of statistical measures used to analyze a set of data.

The mean, often referred to as the “average,” is calculated by adding all the numbers in a data set and then dividing by the count of those numbers.

[latex]\text{mean}={\Large\frac{\text{sum of values in data set}}{n}}[/latex]

The median is the middle number when the data set is arranged in ascending or descending order; if the data set has an even number of observations, the median is the average of the two middle numbers.

The mode, on the other hand, is the number that occurs most frequently in a data set.

You can view the transcript for “Math Antics – Mean, Median and Mode” here (opens in new window).

Find the mean of the numbers [latex]8,12,15,9,\text{ and }6[/latex].

For the past four months, Daisy’s cell phone bills were [latex]\text{\$42.75},\text{\$50.12},\text{\$41.54},\text{\$48.15}[/latex]. Find the mean cost of Daisy’s cell phone bills.

Find the median of [latex]12,13,19,9,11,15,\text{and }18[/latex].

Kristen received the following scores on her weekly math quizzes:

[latex]83,79,85,86,92,100,76,90,88,\text{and }64[/latex].

Find her median score.

The ages of the students in a statistics class are listed here:

[latex]19[/latex] , [latex]20[/latex] , [latex]23[/latex] , [latex]23[/latex] , [latex]38[/latex] , [latex]21[/latex] , [latex]19[/latex] , [latex]21[/latex] , [latex]19[/latex] , [latex]21[/latex] , [latex]20[/latex] , [latex]43[/latex] , [latex]20[/latex] , [latex]23[/latex] , [latex]17[/latex] , [latex]21[/latex] , [latex]21[/latex] , [latex]20[/latex] , [latex]29[/latex] , [latex]18[/latex] , [latex]28[/latex].

What is the mode?

Students listed the number of members in their household as follows:

[latex]6[/latex] , [latex]2[/latex] , [latex]5[/latex] , [latex]6[/latex] , [latex]3[/latex] , [latex]7[/latex] , [latex]5[/latex] , [latex]6[/latex] , [latex]5[/latex] , [latex]3[/latex] , [latex]4[/latex] , [latex]4[/latex] , [latex]5[/latex] , [latex]7[/latex] , [latex]6[/latex] , [latex]4[/latex] , [latex]5[/latex] , [latex]2[/latex] , [latex]1[/latex] , [latex]5[/latex].

What is the mode?

Range, Standard Deviation, and Variance

The Main Idea

Range, standard deviation, and variance are three key measures of dispersion in a dataset.

The range of a dataset is the difference between the highest and lowest values, giving a simple measure of total spread.

[latex]\text{Range } = \text{ maximum value } – \text{ minimum value }  =  \text{ largest value } – \text{ smallest value}[/latex]

Standard deviation, a more complex measure, gauges the amount of variation or dispersion of a set of values. A low standard deviation indicates that the values are close to the mean, while a high standard deviation suggests greater dispersion.

A few important characteristics:

  • Standard deviation is always positive. Standard deviation will be zero if all the data values are equal, and will get larger as the data spreads out.
  • Standard deviation has the same units as the original data.
  • Standard deviation, like the mean, can be highly influenced by outliers.

The following formulas are used to calculate the standard deviation of a population and a sample:

Standard deviation of a population: [latex]\sigma = \sqrt{\dfrac{\sum \left(x-\mu\right)^2}{n}}[/latex], where [latex]\mu[/latex] represents the population mean.

Standard deviation of a sample: [latex]s=\sqrt{\dfrac{\sum \left(x-\bar{x}\right)^2}{n-1}}[/latex], where [latex]\bar{x}[/latex] represents the sample mean.

Variance, often denoted by the squared units of the original data, is the average of the squared differences from the mean, effectively measuring how far each number in the set is from the mean.

Variance of a population:  [latex]\sigma^{2}=\dfrac{\sum\left(x-\mu\right)^{2}}{n}[/latex]

Variance of a sample:  [latex]s^{2}=\dfrac{\sum\left(x-\bar{x}\right)^{2}}{n-1}[/latex]

You can view the transcript for “Measures of Variability (Range, Standard Deviation, Variance)” here (opens in new window).

Five-Number Summary

The Main Idea

The Five-Number Summary is a descriptive statistic that provides information about a dataset. It consists of five values: the minimum, the first quartile (Q1 or 25th percentile), the median (or Q2 or 50th percentile), the third quartile (Q3 or 75th percentile), and the maximum. The minimum and maximum values depict the smallest and largest numbers in the dataset respectively. The first quartile is the median of the lower half of the data (not including the overall median), the third quartile is the median of the upper half, and the median is the middle value of the entire dataset.

To find the first quartile, [latex]Q1[/latex]:

  1. Begin by ordering the data from smallest to largest
  2. Compute the locator: [latex]L = 0.25n[/latex]
  3. If [latex]L[/latex] is a decimal value:
    • Round up to [latex]L+[/latex]
    • Use the data value in the [latex]L+[/latex]th position
  4. If [latex]L[/latex] is a whole number:
    • Find the mean of the data values in the [latex]L[/latex]th and [latex]L+1[/latex]th positions.

To find the third quartile, [latex]Q3[/latex]:

Use the same procedure as for [latex]Q1[/latex], but with locator: [latex]L = 0.75n[/latex]

You can view the transcript for “What is a 5 Number Summary?” here (opens in new window).

Boxplots

The Main Idea

Box plots, also known as box-and-whisker plots, are graphical representations used to depict the spread and skewness of a data set.

They are constructed using the five-number summary: the minimum, first quartile (Q1), median, third quartile (Q3), and maximum.

The interquartile range (sometimes denoted as IQR) is the difference between the quartiles calculated as
[latex]Q3 – Q1[/latex].

The ‘box’ of the plot represents the interquartile range (IQR), stretching from Q1 to Q3, and the line inside the box marks the median. ‘Whiskers’ extend from the box to the minimum and maximum values, showing the full spread of the data. Outliers, if any, are typically indicated as individual points beyond the whiskers.

You can view the transcript for “BOX AND WHISKER PLOTS EXPLAINED!” here (opens in new window).

The box plot below is based on the [latex]9[/latex] female height data with five-number summary:

[latex]59[/latex], [latex]62[/latex], [latex]66[/latex], [latex]69[/latex], [latex]72[/latex].

 

Number line titled Heights (inches), in increments of 1 from 55-75. Above this, a vertical line indicates 59. A horizontal line connects this to the next vertical line, 62. This line forms the left side of a rectangle; a line at 66 is its right side. The line at 66 also serves as the left side of another rectangle, with a line at 69 as its right side. This line at 69 connects with a horizontal line to a final vertical line at 72.

 


The box plot below is based on the household income data with five-number summary:

[latex]15[/latex], [latex]27.5[/latex], [latex]35[/latex], [latex]40[/latex], [latex]50[/latex]
Number line titled Thousands of Dollars, in increments of 5 from 0-55. Above this, a vertical line indicates 15. A horizontal line connects this to the next vertical line, 27.5. This line forms the left side of a rectangle; a line at 35 is its right side. The line at 35 also serves as the left side of another rectangle, with a line at 40 as its right side. This line at 40 connects with a horizontal line to a final vertical line at 50.