- Describe the connection between the residual and the position of a data point relative to the line of best fit.
- Create and use a residual plot to identify influential points and determine the most appropriate regression model.
- Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals.
Analyzing Residuals
We have used scatterplots of data and constructed lines of best fit to describe the relationship in bivariate data. You have learned about the correlation coefficient [latex]r[/latex] and the coefficient of determination [latex]R^{2}[/latex], which are tools we have for determining whether the line of best fit is a useful model and how well the line fits the data.
Another tool we have is the analysis of residuals. When we fit a line to the data, one thing we are interested in is how similar the linear model’s prediction is to the observed data. In other words, we want to know how closely the model matches the data.
residuals
The residual for a data point is the difference between the observed value of the response variable and the linear model’s prediction.
Residual = observed value – predicted value
Residual = [latex]y-\hat y[/latex]
Vocabulary: The word “residual” means “left over” or “remaining.” One way to relate the term “residual” to the concept above is to think of the residual as the quantity left over that can’t be explained by the linear relationship between the response variable and the explanatory variable.
[latex]\hat y=5+3.4x[/latex].
You can calculate the predicted value of the response variable for a value of the explanatory variable [latex]x=6[/latex] in the following way:
[latex]\hat y=5+3.4(6)=5+20.4=25.4[/latex]
Thus, when [latex]x=6[/latex], the predicted value of [latex]\hat y[/latex] will be [latex]25.4[/latex].