Compare data sets by describing their shapes, centers, spreads, and outliers
Comparing Histograms
Earlier, we learned how to describe the distribution of one variable at a time (shape, center, spread, and the presence or absence of outliers). But what if we would like to compare the distribution of multiple data sets or groups?
To compare distribution across groups, we will need to use technology to create and interpret histograms and dotplots for the quantitative variable. Comparisons will include the center, shape, and spread of the data distributions and the presence or absence of outliers. It is the same criteria when describing one distribution as comparing distributions.
Definitions of shape, center, spread, and the presence of outliers are used to describe the distribution of a quantitative variable. Can you define those terms in your own words?
Shape: The overall pattern (left-skewed, right-skewed, symmetric) and the number of peaks (unimodal, bimodal, multimodal, uniform). Center: A measure that describes where the middle of the distribution is. The center is a number that describes a typical value. For example, one way to think about the center is that it could be the point in the distribution where about half of the observations are below it and half are above it. Spread: A measure of how far apart the data are. In the previous and upcoming lesson, the range is used to measure spread. The range is the difference between the maximum value and the minimum value. Outliers: Unusual observations that are outside the general pattern of the distribution.
Create the histograms of Airbnb rental prices ($) in New York City to answer more questions.
STEP 1: Select the “Several Groups” tab at the top of the page.
STEP 2: Choose the data set “Airbnb Price by Type of Room”.
STEP 3: Under “Choose Type of Plot”, select “Histogram”