Module 13: Cheat Sheet

Download a PDF of this page here.

Download the Spanish version here.

Essential Concepts

  • The purpose of a one-way ANOVA (Analysis of Variance) test is to determine the existence of a statistically significant difference among several group means.
  • The null hypothesis for a one-way ANOVA states that all the group/population means are the same. This can be written as:[latex]H_0: \mu_1 = \mu_2 = ... = \mu_k[/latex]where [latex]k[/latex] is the number of independent groups or samples.
  • The alternative hypothesis for a one-way ANOVA should be written as:[latex]H_{A}:[/latex] At least two of the group means are different.
  • The statistic measuring the variation within the groups is the error sum of squares. This is calculated by summing the variation within each of the groups. The variation within each of the groups is visualized in the boxplot by the size of the box and in the dotplot as the spread of the dots within each group.
  • A statistic measuring the variation between the groups is the group sum of squares. This is calculated by summing the variation between each of the group means and the grand mean (i.e., the mean of all the data values).
  • Data for ANOVA
    • The factor is a categorical variable.
    • The response is a numerical variable.
    • The mean of the response variable is the parameter of interest.
  • In order to perform a one-way ANOVA test, there are some basic assumptions to be fulfilled:
    1. All samples are randomly selected and independent.
    2. An ANOVA also requires that the data within each group be normally distributed, but testing for that is outside the scope of this course.
    3. The populations are assumed to have equal standard deviations (or variances).
  • Steps to conduct a one-way ANOVA hypothesis test
    1. Set up the null and alternative hypothesis.
    2. Check the conditions/assumptions for the ANOVA hypothesis test.
      • The right types of data—factor of interest should be categorical and response variable should be numeric and continuous
      • Similar levels of variability
      • Randomly assigned, independent groups
    3. Calculate the [latex]F[/latex]-statistic (See ANOVA Table below).
    4. Calculate the P-value.
    5. Compare the P-value to the significance level, [latex]\alpha[/latex], to make a decision.
    6. Write a conclusion in context (e.g., we do/do not have convincing evidence…)
  • ANOVA Table
    Source Degrees of Freedom (df) Sum of Squares Mean Square F-Statistic
    Group [latex]k-1[/latex]

    (The number of groups minus 1)

    SSGroup [latex]\dfrac{\text{SSGroup}}{k-1}[/latex] [latex]\dfrac{\text{MSGroup}}{\text{MSError}}[/latex]
    Error [latex]N-k[/latex]

    (The total number of data points minus the number of groups)

    SSError [latex]\dfrac{\text{SSError}}{N-k}[/latex]  
    Total [latex]N-1[/latex]

    (The total number of data points minus 1)

    SSGroup + SSError    
  • The pair-wise comparison for ANOVA is a process of analyzing groups/populations by comparing them against each other in pairs. When conducting pair-wise comparison for ANOVA, we will be conducting multiple two-sample tests in order to find the significant difference(s) among the means.
  • The probability of committing a type I error is equal to the significance level: [latex]P(\text{Type I Error}) = \alpha[/latex].
  • We need a method to maintain an overall level of significance even when several tests are performed. We call this the family-wise error rate. The family-wise error rate is defined as the probability of rejecting at least one of the true null hypotheses.Suppose we perform [latex]m[/latex] independent hypothesis tests.The probability of making a type I error (at least one false rejection) is: [latex]1-(1-\alpha)^m[/latex].
  • One method for controlling for a family-wise error rate is the Tukey method for all pair-wise comparisons (formally Tukey-Kramer method). This method adjusts the length of the confidence interval (to ensure an overall level of confidence) and the P-value (to ensure an overall significance level for all pair-wise comparisons).

Key Equations

F-Statistic

[latex]\dfrac{\text{MSGroup}}{\text{MSError}}=\dfrac{\text{Variation BETWEEN groups}}{\text{Variation WITHIN groups}}[/latex]

Mean Square for Error (MSError)

[latex]\dfrac{\text{Error sum of squares}}{\text{degrees of freedom (Error)}}=\dfrac{SSE}{N-k}[/latex]

Mean Square for Group (MSGroup)

[latex]\dfrac{\text{Group sum of squares}}{\text{degrees of freedom (Group)}}=\dfrac{SSG}{k-1}[/latex]

Total Sum of Squares (SSTotal)

[latex]\text{SSTotal} = \text{SSGroup} + \text{SSError}[/latex]

Glossary

data snooping, data fishing

only showing the comparisons you want to show based on the boxplot

error sum of squares

the statistic measuring the variation within the groups

family-wise error rate

the probability of rejecting at least one of the true null hypotheses

group sum of squares

a statistic measuring the variation between the groups

one-way ANOVA

a statistical test for comparing and making inferences about means associated with two or more groups

Tukey method (Tukey-Kramer method)

A method for controlling for a family-wise error rate. This method adjusts the length of the confidence interval and the P-value to ensure an overall level of confidence and significance for all pair-wise comparisons.