Bad Drivers and Good Questions (continued)
Let’s look at the data set included in the “Dear Mona, Which State Has The Worst Drivers” article [1], in which the data are used to answer a relevant and interesting question. View the full data set here.
| State | num_ drivers | perc_ speeding | perc_ alcohol | perc_not_ distracted | perc_no_ previous | insurance_ premiums | losses | |
| 1 | Alabama | 18.8 | 39 | 30 | 96 | 80 | 784.55 | 145.08 |
| 2 | Alaska | 18.1 | 41 | 25 | 90 | 94 | 1053.48 | 133.93 |
| 3 | Arizona | 18.6 | 35 | 28 | 84 | 96 | 899.47 | 110.35 |
| 4 | Arkansas | 22.4 | 18 | 26 | 94 | 95 | 827.34 | 142.39 |
| 5 | California | 12 | 35 | 28 | 91 | 89 | 878.41 | 165.63 |
| 6 | Colorado | 13.6 | 37 | 28 | 79 | 95 | 835.5 | 139.91 |
| 7 | Connecticut | 10.8 | 46 | 36 | 87 | 82 | 1068.73 | 167.02 |
| 8 | Delaware | 16.2 | 38 | 30 | 87 | 99 | 1137.87 | 151.48 |
| 9 | District of Columbia | 5.9 | 34 | 27 | 100 | 100 | 1273.89 | 136.05 |
| 10 | Florida | 17.9 | 21 | 29 | 92 | 94 | 1160.13 | 144.18 |
data dictionary
A data dictionary is the format for displaying and describing the variables of a data set.
The variable names are presented in italics, followed by a brief description to give the reader more clarity.
The following seven variables are included in the data set provided in the article.[2]
- state: All 50 states, plus the District of Columbia
- num_drivers: Number of drivers involved in fatal collisions per billion miles
- perc_speeding: Percentage of drivers involved in fatal collisions who were speeding
- perc_alcohol: Percentage of drivers involved in fatal collisions who were alcohol-impaired
- perc_not_distracted: Percentage of drivers involved in fatal collisions who were not distracted
- perc_no_previous: Percentage of drivers involved in fatal collisions who had not been involved in any previous accidents
- insurance_premiums: Average combined car insurance premiums ($)
- losses: Losses incurred by insurance companies for collisions per insured driver ($)