- Name the features of the distribution of a data set using statistical language
- Describe the connection between the distribution of a data set and its mean and median
Appropriate Measures of Center
In the previous example, we saw how the mean was not an accurate representation of the typical salary for a Texas NBA player due to the existence of outliers. Now, let’s take a look at other situations to determine whether it would be more appropriate to use the mean or median to describe a typical observation.
Consider the distribution of three different sets of data:
- Income in New York City
- GPA at a local college
- Body temperature
Situation 1: Data are collected on the income of residents in New York City.
Situation 2: Data are collected on the GPAs of students enrolled at a local college.
Situation 3: Data are collected on people’s body temperatures.
These examples illustrate some general guidelines for choosing numerical summaries:
- Use the mean as a measure of center only for distributions that are reasonably symmetric with a central peak. When outliers or skew are present, the mean is not a good choice.
- Use the median as a measure of center for all other cases.
Both of these examples also highlight another important principle: Always plot the data.
We need to see the distribution to help us determine the shape of the distribution. By looking at the shape, we can determine which measures of center best describe the data.