Data Collection Basics: Fresh Take

  • Understand the difference between a census and a sample, and be able to identify the population being studied
  • Distinguish between a value calculated from a sample and one calculated from a population
  • Categorize a measurement as either numeric or qualitative

Population vs Sample

The Main Idea 

In statistics, the distinction between a population and a sample is pivotal. The population refers to the entire group that is the focus of a researcher’s interest, which can be people, animals, objects, or events. It includes all individuals or items that possess the characteristics that the researcher wants to study.

On the other hand, a sample is a subset of this population, selected for participation in the study. Samples are used because it is often impractical or impossible to study the entire population. Therefore, a well-selected sample should be representative of the population, allowing researchers to make inferences about the larger group based on the sample’s data. Different sampling methods are employed to ensure the sample is as representative as possible, reducing bias and facilitating the generalization of the study’s results.

You can view the transcript for “Population vs Sample” here (opens in new window).

To determine the average length of fish in a lake, researchers catch [latex]20[/latex] fish and measure them. What is the sample and population in this study?

Quantitative or Categorical

Once we have gathered data, we might wish to classify it.  Roughly speaking, data can be classified as categorical data or quantitative data.

vertical lines of colored circles

The Main Idea 

Categorical (qualitative) data are pieces of information that allow us to classify the objects under investigation into various categories.

Quantitative data are responses that are numerical in nature and with which we can perform meaningful arithmetic calculations.

We might conduct a survey to determine the name of the favorite movie that each person in a math class saw in a movie theater. When we conduct such a survey, the responses would look like: Avatar: The Way of Water, The Super Mario Bros. Movie, or Creed III. We might count the number of people who give each answer, but the answers themselves do not have any numerical values: we cannot perform computations with an answer like “Avatar: The Way of Water.” Is this categorical or quantitative data?

A survey could ask the number of movies you have seen in a movie theater in the past [latex]12[/latex] months ([latex]0, 1, 2, 3, 4, . . .[/latex]). Is this categorical or quantitative data?

Sometimes, determining whether data is categorical or quantitative can be a bit trickier.  In the next example, the data collected is in numerical form, but it is not quantitative data. Read on to find out why.

Suppose we gather respondents’ ZIP codes in a survey to track their geographical location. Is this categorical or quantitative?

Map of Portland, OR with zip codes.
Zip Codes for Portland, OR

A survey about the movie you most recently attended includes the question “How would you rate the movie you just saw?” with these possible answers:

  1. it was awful
  2. it was just OK
  3. I liked it
  4. it was great
  5. best movie ever!

Is this categorical or quantitative?

The examples in this page are discussed further in the following video:

You can view the transcript for “Qualitative and Quantitative” here (opens in new window).