Assessing the Fit of a Line: Learn It 4

  • Describe the connection between the residual and the position of a data point relative to the line of best fit.
  • Create and use a residual plot to identify influential points and determine the most appropriate regression model.
  • Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals.

Residual Standard Error

Using [latex]r[/latex] and [latex]R^2[/latex], we are able to determine whether the line of best fit is a useful model and how well the line fits the data. However, we’ve also seen how the line of best fit can be used to calculate predicted values. So, how can we make a general assessment of the accuracy of predictions from the line? To do so, we’ll look at the distribution of residuals, specifically focusing on the variability.

residual standard error

The residual standard error, [latex]s_e[/latex], is a measure of the variability in the residuals. It is also known as the residual standard deviation. It is the typical error we expect in predictions using the line of best fit. It is a way to quantify the spread of the points around the line of best fit on the scatterplot.

The formula for the residual standard error is: [latex]s_e = \sqrt{\dfrac{1}{n-2}\left(y_i-\hat{y}_i\right)^{2}}[/latex]

A large residual standard error indicates there is a lot of spread in the scatter of the points around the line of best fit and thus more variability in the residuals. If all the data points fit perfectly on the line, the line is a perfect fit for the data and the residual standard error will be zero. This scenario almost never occurs in practice, since there is rarely data with observations that fall in a perfect line.

One thing to keep in mind is that the regression standard error has the same units as the response variable. Therefore, you want to keep the response variable, units, and context of the data in mind as you use the residual standard error to evaluate how well the line fits the data.

Note: Most statistical software computes [latex]r[/latex], [latex]R^2[/latex], and [latex]s_e[/latex]. Therefore, our focus is not on calculating but on understanding and interpreting.

Select the Animal Longevity data set and let’s investigate this data set again.

[Trouble viewing? Click to open in a new tab.]