- Understand what is measured by SSRegression, SSResiduals, and SSTotal in a regression context
- Discuss the factors that affect the value of F-statistics in a regression context
ANOVA for Regression
An ANOVA is a way to “partition” the variation in the data. In other words, it divides the total variation into two parts: the part that is explained by the regression model (SSRegression) and the part that remains unexplained (SSResiduals).
[latex]\text{SSTotal} = \text{SSRegression} + \text{SSResiduals}[/latex]
Previously, you learned that the coefficient of determination, [latex]R^2[/latex], is interpreted as the percentage of variation in the response variable that can be explained by the linear relationship with an explanatory variable. This quantity can be expressed using the sums of squares. Note that [latex]R^2[/latex] can be expressed as a percentage or as a proportion.
[latex]R^2 = \dfrac{\text{variation explained}}{\text{total variation}} = \dfrac{\text{SSRegression}}{\text{SSTotal}} = 1-\dfrac{\text{SSResiduals}}{\text{SSTotal}}[/latex]
Sums of squares can be organized in an ANOVA table. The following table provides the information necessary to calculate an F-statistic in the context of regression. Note that [latex]n=[/latex] sample size and [latex]p=[/latex] number of predictors. In simple linear regression, [latex]p=1[/latex].
| Source | [latex]df[/latex] | Sum sq ([latex]\text{SS}[/latex]) | Mean sq ([latex]\text{MS}[/latex]) | F value |
| Regression | [latex]p[/latex] | [latex]\text{SSRegression}[/latex] | [latex]\text{MSRegression} = \dfrac{\text{SSRegression}}{p}[/latex] | [latex]F = \dfrac{\text{MSRegression}}{\text{MSResiduals})}[/latex] |
| Residuals | [latex]n-1-p[/latex] | [latex]\text{SSResiduals}[/latex] | [latex]\text{MSResiduals} = \dfrac{\text{SSResiduals}}{n-1-p}[/latex] | |
| Total | [latex]n-1[/latex] | [latex]\text{SSTotal}[/latex] |
Step 1: Select the “Organic Foods” data set.
Step 2:Select “Average income in zip code” as the explanatory ([latex]x[/latex]) variable and “Number of organic items offered” as the response ([latex]y[/latex]) variable.
Step 3: Under “Regression Options,” click the box to show the ANOVA table.