Assessing the Fit of a Line: Learn It 2

  • Describe the connection between the residual and the position of a data point relative to the line of best fit.
  • Create and use a residual plot to identify influential points and determine the most appropriate regression model.
  • Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals.

Appropriateness of a Linear Model with Residuals

Examining the residuals can give us useful information about whether a line of best fit is an appropriate choice for modeling the data in question.

When a linear regression is “appropriate,” the value of the residual will be randomly scattered around [latex]0[/latex]. That is, some residuals will be positive (observed value above the line) and some will be negative (observed value below the line), but we do not want to see some systematic pattern (e.g., all above in order and then all below).

In particular, we might worry about the appropriateness of the model if we notice the following:

  • The trend in the scatterplot is non-linear, indicating that the relationship between the explanatory variable and the response variable is not modeled very well by a line. The residuals tend to have a pattern. The observations are above and below the line systematically.
  • The observed values are further and further away from the line of best fit for a portion of the data. That is, the errors are not consistent for all values of the explanatory variable. We will often see that the size of the residuals tends to increase or decrease as the value of the explanatory variable increases. When this happens, it can be hard to get a handle on the accuracy of the model because the standard deviation of the residuals is not constant over the values of the independent variable.