Distribution of Quantitative Variables: Fresh Take

  • Describe the graph of a data set using its shape, center, spread, and outliers

Shape, Center, and Spread

When the observations of a quantitative variable are displayed in a graph, we call the display the distribution of a quantitative variable. We display the distribution to read information about the dataset from its graph. The first three characteristics of the distribution are shape, center, and spread.

The Main Idea

Shape: Is the data symmetrically distributed, or does it “bunch up” on one side or the other?

  • If the majority of the data lies to the left with a long tail of lower values to the right, we say it is right-skewed.
  • If the majority of the data lies to the right with a long tail of lower values to the left, we say it is left-skewed.
  • If the data is centered and falls out evenly on both sides, we say it is symmetric.
  • Unimodal data has one mound (cluster) of data. Bimodal has two mounds of data. Multimodal has more than two mounds of data.

Center: Where does the center of the data appear to be? Center can be measured as the median or the mean of the dataset. When looking at the distribution, you should consider where the heaviest “weight” of the data lies.

Spread: The spread of a data distribution measures the range of the data (from least to greatest, found by subtracting the smallest value from the largest). Spread is also concerned with gaps in the data and with outliers, which are rare values far to the left (lower outliers) or to the right (upper outliers) of the bulk of the data. Outliers extend the range beyond what the bulk of the data indicates it should be. Extreme outliers can affect the mean of the dataset, pulling it in the direction of the outlier.

The following video will help you visually examine a quantitative data distribution for shape, center, spread, and the presence of outliers.

histogram looks somewhat like a bar graph. But while a bar graph displays categorical data and shows counts of observations within categories, a histogram displays quantitative data by showing frequencies of a quantitative variable. The bars of a histogram are each of the same width and meet smoothly together over the horizontal axis. The width of each bar covers a range of values along the axis called a binbin is a range of values that the quantitative variable can take. A bin can be defined by its end points, the smallest and largest values of the quantitative variable represented in the bin. The width of the bin, called binwidth, is calculated by taking the difference between the values of the end points.

[Trouble viewing? Click to open in a new tab.]
 

Distribution of a Quantitative Variable

When we describe patterns in data, we use descriptions of shapecenter, and spread. We also describe exceptions to the pattern. We call these exceptions outliers.

Flow chart with three levels. The first level is "Graph the distribution of a quantitiative variable" which points to two different boxes on the second level, "Overall pattern" and "Deviations from the pattern". Overall pattern points to three options, "Shape", "Center", and "Spread." "Deviations from the pattern" points to one option, "Outliers."
Figure 1. When analyzing a graph, describe the overall pattern (shape, center, spread) and look for deviations, or outliers, that don’t follow the pattern.