- Know what an indicator variable is
- Find and describe an appropriate multiple linear regression model equation with categorical predictors
What is an indicator variable?
An indicator variable (also known as a dummy variable) is a binary variable with only two values: [latex]0[/latex] and [latex]1[/latex]. When creating an indicator variable, we assign the value of [latex]1[/latex] for a certain category, and the value of [latex]0[/latex] is used for all other categories. Dummy variables are commonly used in regression analysis to represent categorical variables that have more than two levels, such as education level or occupation. In this case, multiple dummy variables would be created to represent each level of the variable, and only one dummy variable would take on a value of 1 for each observation.
[1]Dummy variables are useful because they allow us to include categorical variables in our analysis, which would otherwise be difficult to include due to their non-numeric nature. They can also help us to control for confounding factors and improve the validity of our results.
The reference group is the set of observations for which the indicator variable is always zero.
This allows us to work with variables that have many levels, like education level (which could be High School or equivalent, Associate Degree, Bachelor Degree, Master Degree, Doctorate Degree, or Others).
In this case, the reference group are the set of data that has the education level of High School or equivalent, Associate Degree, Master Degree, Doctorate Degree, or Other.
- https://en.wikipedia.org/wiki/Dummy_variable_(statistics) ↵