- Know what an indicator variable is
- Find and describe an appropriate multiple linear regression model equation with categorical predictors
Indicator Variable
How can we incorporate a categorical variable like gender into a regression equation? This is done by using an indicator variable for the categorical variable and then using that variable in fitting the linear regression model.
indicator variable
An indicator variable is a binary variable with only two values: [latex]0[/latex] and [latex]1[/latex]. When creating an indicator variable, we assign the value of [latex]1[/latex] for a certain category, and the value of [latex]0[/latex] is used for all other categories.
An example of an indicator variable is the variable schtyp in the data set about the school type of the students. For the variable schtyp, the value of [latex]1[/latex] indicates that the student attends a public school, and the value of [latex]2[/latex] indicates that the student attends a private school.
If there are [latex]k[/latex] levels of a categorical variable, we can create [latex]k-1[/latex] indicator variables to define our regression model. We use an indicator variable for a categorical variable in order to clarify which category we want as our reference group for the model equation.
For example, the reference group for the indicator variable public [latex]0[/latex] is the category indicating private school students.
Recall that we are interested in building a prediction model that will allow us to predict science test scores based on math test scores and gender. After creating the indicator variable, our model will have math test scores and the indicator variable female as the explanatory variables.