- Create a bootstrap distribution for a sample mean
- Find and describe a bootstrap percentile confidence interval for a population mean
Why do we need a bootstrap distribution to calculate a confidence interval for a population mean?
Previously, we learned how to estimate a population mean using a one-sample [latex]t[/latex] confidence interval. Use of this interval is only appropriate when the sampling distribution of the sample mean is approximately normal, which is the case if the population distribution is normal or if the sample size is large. However, in real life, many variables have population distributions that are skewed, and there are situations where it is not possible or not practical to take a large sample. The bootstrap confidence interval introduced in this section provides an alternative that can be used in situations where it is not appropriate to use the [latex]t[/latex] confidence interval.
Starting with a random sample selected from the population of interest, a bootstrap sample is a random sample that is selected WITH replacement from the original sample and that is the same size as the original sample. Bootstrap samples are then used to construct a bootstrap distribution, which is like a sampling distribution.
Ideally, if we wanted to approximate the sampling distribution of a statistic, we could take many, many samples from the population of interest, calculate the value of the statistic for each sample, and then make a distribution using the statistic values from these samples. This is probably how we created sampling distributions previously as well.
Bootstrapping provides a different way to approximate a sampling distribution. Rather than taking many different samples from the actual population (which may not be feasible), we take samples from a hypothetical population that we think is very similar to the actual population we are interested in.
So, how do we get this hypothetical population? If we think that the original sample is representative of the population, we think that the actual population distribution is going to be similar to the distribution we see in the sample; however, the actual population is much larger. For example, if some of the values in our sample are [latex]3[/latex], [latex]7[/latex], and [latex]9[/latex], we think that the population probably has lots of [latex]3[/latex]s, [latex]7[/latex]s, and [latex]9[/latex]s. If [latex]25\%[/latex] of the values in our sample are [latex]4[/latex]s, we think that the population probably has about [latex]25\%[/latex] of its values that are [latex]4[/latex]s. Another way of thinking of this is that we can think of the population as being made up of many, many copies of the sample values.
Thinking of the hypothetical population in this way explains why, when we want to simulate taking samples from this hypothetical population, we just sample with replacement from the original sample.
With this idea in mind, a bootstrap distribution is constructed using the values of a sample statistic calculated from a large number of bootstrap samples. This bootstrap distribution is an approximate sampling distribution and provides the information needed to construct confidence intervals in the same way that a known sampling distribution does. Percentiles from the bootstrap distribution are used as the endpoints of a confidence interval.