Indicator Variable – Learn It 2

  • Know what an indicator variable is
  • Find and describe an appropriate multiple linear regression model equation with categorical predictors

Indicator Variable

How can we incorporate a categorical variable like gender into a regression equation? This is done by using an indicator variable for the categorical variable and then using that variable in fitting the linear regression model.

indicator variable

An indicator variable is a binary variable with only two values: [latex]0[/latex] and [latex]1[/latex]. When creating an indicator variable, we assign the value of [latex]1[/latex] for a certain category, and the value of [latex]0[/latex] is used for all other categories.

An example of an indicator variable is the variable schtyp in the data set about the school type of the students. For the variable schtyp, the value of [latex]1[/latex] indicates that the student attends a public school, and the value of [latex]2[/latex] indicates that the student attends a private school.

If there are [latex]k[/latex] levels of a categorical variable, we can create [latex]k-1[/latex] indicator variables to define our regression model. We use an indicator variable for a categorical variable in order to clarify which category we want as our reference group for the model equation.

A reference group is the value of the categorical variable that is not represented explicitly by the indicator variable (which is why we only require [latex]k-1[/latex] indicator variables to define our regression model).

For example, the reference group for the indicator variable public [latex]0[/latex] is the category indicating private school students.

Recall that we are interested in building a prediction model that will allow us to predict science test scores based on math test scores and gender. After creating the indicator variable, our model will have math test scores and the indicator variable female as the explanatory variables.