Indicator Variable – Apply It 1

  • Know what an indicator variable is
  • Find and describe an appropriate multiple linear regression model equation with categorical predictors

Recall that the data set contains information about high school student achievement scores on math, science, reading, writing, and social studies tests. The data set contains information about [latex]200[/latex] high school students and [latex]10[/latex] variables for each student. Descriptions of the variables are as follows:

Variable name Definition
id Identification number of the student
female Gender of the student (0 = male, 1 = female)
race Ethnic background of the student (1 = Hispanic, 2 = Asian, 3 = Black, 4 = White)
ses Socio-economic status of the student (1 = low, 2 = medium, 3 = high)
schtyp School type (1 = public, 2 = private)
prog Program type (1 = general, 2 = academic preparatory, 3 = vocational/technical)
read Score from test of reading
write Score from test of writing
math Score from test of math
science Score from test of science
socst Score from test of social studies
Indicator variables can only have values of [latex]0[/latex] or [latex]1[/latex]. So, if the categorical variable has more than two categories, additional indicator variables will be needed.

For the variable prog, we will need to create two indicator variables to add the explanatory variable into the model.

One indicator variable for prog could be defined as academic preparatory, and the other indicator variable for prog could be defined as vocational/technical.

These two indicator variables would be defined using the following:

  • If prog = academic preparatory, the indicator variable for academic_preparatory = [latex]1[/latex]. Otherwise, if prog = vocation/technical or general, the indicator variable for academic_preparatory = [latex]0[/latex].
  • If prog = vocational/technical, the indicator variable for vocational_technical = [latex]1[/latex]. Otherwise, if prog = academic preparatory or general, the indicator variable for vocational_technical = [latex]0[/latex].

We do not need three indicator variables because prog = general is captured when academic_preparatory = 0 and vocational_technical = 0.

In general, we will need [latex]k-1[/latex] indicator variables for a categorical variable with [latex]k[/latex] categories.