Line of Best Fit: Fresh Take

Lumen Learning

Line of Best Fit: Fresh Take

Recognize when a linear regression model will fit a given data set.
Use technology to create scatterplots, find the line of best fit, and find the correlation coefficient.
Find the estimated slope and [latex]y[/latex]-intercept for a linear regression model.
Use the line of best fit to predict values.

Bivariate data is data that contains two variables.

An explanatory variable is an independent variable.It may explain or cause a change in another variable.

A response variable is a dependent variable. It changes in response to the explanatory variable.

The main idea

Least Squares Regression (LSR) analysis is a statistical tool that models the strength of a linear relationship between an independent (explanatory) variable and a dependent (response) variable.

A scatterplot is used to display the relationship, in which each data point is a pair of data values, both quantitative, one independent and one dependent. See the image below, depicting the quarterly percent change in GDP over the quarterly percent change in the unemployment rate. Each data point tells us that, when the percent change in unemployment is some particular amount, the percent change in GDP is a particular corresponding amount.

A scatterplot of the quarterly change in unemployment rate and GDP, with the line of best fit in black.

If we think the data on the scatterplot looks even roughly linear, as it does in the graph above, we can try to find a line of best fit using Least Squares Regression (LSR).

While LSR can be performed by hand, we’ll use technology.

Two Conditions Before Using Least Squares Regression

1. Both variables must be quantitative.

2. The data must appear at least roughly linear when graphed on a scatterplot.

The Least Squares Regression analysis produces a line through the data set that best approximates the linear trend present in the data. It does so by minimizing the sum of the distances between each data point and the line itself.

Vocabulary

Least Squares Regression, Linear Regression, and Linear Modeling are all terms for the same thing: Finding a line of best fit for a data set.
The line of best fit is also called the Least Squares Regression line or the regression line.
The distance between any data point and the line of best fit is called the residual, or the vertical error of the data point.

The equation of the line of best fit is the equation of a line, [latex]\hat{y}=a+bx[/latex]. The notation [latex]\hat{y}[/latex] is a statistical notation that indicates the output of the equation, the value of the dependent variable, is the general predicted value of the response variable for this linear model.

The correlation coefficient, [latex]r[/latex] tells us how strong the linear relationship is. Values of [latex]r[/latex] very close to [latex]-1[/latex] or [latex]1[/latex] are strongly linear, with most of the data points very close to the line of best fit. The closer [latex]r[/latex] is to [latex]0[/latex], the weaker the linear relationship is between the two variables.

If [latex]r[/latex] is close to [latex]-1[/latex] (negative 1), we say the linear relationship is strongly decreasing.
If [latex]r[/latex] is close to [latex]1[/latex] (positive 1), we say the linear relationship is strongly increasing.
If [latex]r[/latex] is close to [latex]0[/latex], or equal to [latex]0[/latex], we say the relationship is not linear.

See the image below, which labels each scatterplot shape with its [latex]r[/latex]-value.

An image showing many different scatterplot shapes that can occur with the correlation coefficient listed above it.

The videos below will introduce you to the ideas of correlation and Least Squares Regression