Assessing the Fit of a Line: Learn It 3

  • Describe the connection between the residual and the position of a data point relative to the line of best fit.
  • Create and use a residual plot to identify influential points and determine the most appropriate regression model.
  • Determine the reliability of predictions from the line of best fit using the residuals and standard error of the residuals.

Appropriateness of a Linear Model with [latex]R^{2}[/latex]

We also might worry about the appropriateness of the model if we notice an extreme observation that affects the value of the line. Recall from earlier activities that an outlier is an extreme observation that is far away from the rest of the data. In fitting a regression line, an outlier can also be an observation that does not fit the trend of the data as well. We call this type of outlier an influential point. This point drastically changes the equation of the line, consequently increasing the values of all of the residuals. An influential point appears to “pull” the line towards its value. We will also study how these points affect [latex]R^{2}[/latex]. It is important to note that not all outliers affect the equation of the line.

Select the Animal Longevity data set and let’s investigate this data set. Don’t forget to click on Regression Line box to see the line of best fit in the scatterplot. Make sure that gestation is the explanatory variable and longevity is the response variable.

[Trouble viewing? Click to open in a new tab.]

An influential point in statistical analysis is an observation that has a notable impact on the results of a model. This impact can arise when a data point has extreme values or significantly deviates from the overall pattern of the data. 

Influential points can have a substantial impact on the estimated parameters of the model, such as the regression coefficients, and may also affect measures of model fit, like [latex]R^2[/latex].