- Describe the connection between the residual and the position of a data point relative to the line of best fit.
- Create and use a residual plot to identify influential points and determine the most appropriate regression model.
- Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals.
We have used scatterplots of data and constructed lines of best fit to describe the relationship in bivariate data. We have also learned about the correlation coefficient [latex]r[/latex] and the coefficient of determination [latex]R^2[/latex]. Let’s review some of these ideas before looking into residuals.
Residuals
The following are different ways of expressing the same idea:
- The residual is the difference between the observed value and the predicted value.
- The residual is the vertical distance between the observed value and the predicted value.
In all cases, in order to calculate the residual, you must subtract the predicted value from the observed value.
Residual Plots
The graph below shows a scatterplot and the regression line for a set of ten points. The blue points represent our original data set (our observed values). The red points, lying directly on the regression line, are the predicted values.

The vertical arrows from the predicted to observed values represent the residuals. The up arrows correspond to positive residuals, and the down arrows correspond to negative residuals.
A residual can be positive or negative. A data point has a positive residual when the data point is located above the line of best fit. A data point has a negative residual when the data point is located below the line of best fit.