Scatterplots & Correlation Coefficients: Learn It 1

  • Create scatterplots for bivariate data and answer questions from the graph.
  • Describe the trend of bivariate data.
  • Calculate the correlation coefficient and explain what it means.

Greenhouse Gases

Carbon dioxide is a greenhouse gas. This means it absorbs and radiates heat. Warmed by sunlight, Earth’s land and ocean surfaces continuously radiate thermal infrared energy (heat). Unlike oxygen or nitrogen (which make up most of our atmosphere), greenhouse gases absorb that heat and release it gradually over time, like bricks in a fireplace after the fire goes out. Without this natural greenhouse effect, Earth’s average annual temperature would be below freezing instead of close to [latex]60[/latex]°F. But increases in greenhouse gases have tipped the Earth’s energy budget out of balance, trapping additional heat and raising the Earth’s average temperature.[1]

Scatterplots

scatterplot

Scatterplots are used to illustrate the relationship between two quantitative variables. Such data from two quantitative variables (usually two related data) are called bivariate data.

When investigating relationships between two quantitative variables, scatterplots are a simple way to visually represent the spread, direction, strength of relationship, and potential outliers of the data.

With larger data sets, a scatterplot can more succinctly display the overall pattern than when the data is presented as a table. This visualization can also hint at the general shape of the relationship. (For example: increasing linear, decreasing linear, or non-linear curves.) This also helps us identify any deviations from that pattern.

When working with a bivariate data set, there are two variables to consider:

  • The explanatory variable ([latex]x[/latex]) is the variable that is thought to explain or predict the response variable of a study.
  • The response variable ([latex]y[/latex]) measures the outcome of interest in the study. This variable is thought to depend in some way on the explanatory variable. It is often referred to as the “variable of interest” for the researcher. (And in previous math courses, this variable may have been referred to as the dependent variable.)

Sometimes the variables do not have a clear explanatory/response relationship. In this case, there is no rule to follow. You may plot the variables on either axis.

Carbon Footprint

A carbon footprint is the total amount of greenhouse gas (GHG) emissions caused directly and indirectly by an individual, organization, event, or product. It is calculated by summing the emissions resulting from every stage of a product or service’s lifetime (material production, manufacturing, use, and end of life). A typical U.S. household has a carbon footprint of [latex]48[/latex] metric tons of carbon dioxide equivalent per year (CO2e/yr).[2]The food you eat has a carbon footprint. Energy is involved in producing the food, transporting the food, preparing the food, eating the food, and disposing of any waste from the food. We can analyze the energy content and carbon footprint of your food using a scatterplot. Because the purpose of this study is to explore the effect of energy content on the carbon footprint of your food:

  • The explanatory variable is energy content, and
  • The response variable is the carbon footprint of your food.

Both variables are quantitative.

Here is what the raw bivariate data look like:

Sandwich Energy Content (kCal) Carbon Footprint (g CO2)
Chicken salad [latex]351[/latex] [latex]963[/latex]
Prawn, mayo [latex]339[/latex] [latex]1255[/latex]
Egg, mayo, cress [latex]319[/latex] [latex]739[/latex]
Egg, rocket [latex]363[/latex] [latex]854[/latex]

Note that for each sandwich, we have two values: Energy content and carbon footprint.

To explore the relationship between the two quantitative variables, we create a graph called a scatterplot. To create a scatterplot, we use an ordered pair [latex](x,y)[/latex] to represent the data for each sandwich. The [latex]x[/latex]-coordinate is the explanatory variable, energy content. The [latex]y[/latex]-coordinate is the response variable, carbon footprint.

For example, the point [latex](351,963)[/latex] represents chicken salad.

Let’s use the statistical tool below to plot the scatterplot! Select “Carbon Footprint” data set from the drop-down menu under “Choose Dataset.”

[Trouble viewing? Click to open in a new tab.]


  1. Lindsey, R. (2020, August 14). Climate change: Atmospheric carbon dioxide. NOAA Climate.gov. https://www.climate.gov/news-features/understanding-climate/climate-change-atmospheric-carbon-dioxide
  2. Center for Sustainable Systems, University of Michigan. (2020). Carbon footprint factsheet. http://css.umich.edu/factsheets/carbon-footprint-factsheet