Indicator Variable – Learn It 1

  • Know what an indicator variable is
  • Find and describe an appropriate multiple linear regression model equation with categorical predictors

High School and Beyond

Person with a mask on reading a book and taking notes on a laptop.

The data set of High School and Beyond contains information about [latex]200[/latex] high school students and [latex]10[/latex] variables for each student. The data collected about each student includes whether the student identifies as male or female, their socio-economic status, along with science scores, math scores, and reading scores.

Descriptions of the variables are as follows:

Variable name Definition
id Identification number of the student
female Gender of the student (0 = male, 1 = female)
race Ethnic background of the student (1 = Hispanic, 2 = Asian, 3 = Black, 4 = White)
ses Socio-economic status of the student (1 = low, 2 = medium, 3 = high)
schtyp School type (1 = public, 2 = private)
prog Program type (1 = general, 2 = academic preparatory, 3 = vocational/technical)
read Score from test of reading
write Score from test of writing
math Score from test of math
science Score from test of science
socst Score from test of social studies

We are interested in building a prediction model that will allow us to predict science test scores based on math test scores and the recorded gender of the student.
Note: In this study, students only identified themselves as female or male.