Module 1: Cheat Sheet

Download a PDF of this page here.

Download the Spanish version here.

Essential Concepts

  • Statistics is a science that deals with the collection, analysis, interpretation, and presentation of data.
  • The goal of statistics is to make an inference (a logical conclusion or guess) about the population based on a sample.
  • The statistical process is an investigative process; it is also a repetitive or cyclical process.
    • Ask a question that can be answered by collecting data
    • Define the population and design a study
    • Collect data from a sample (or samples) of the population
    • Summarize and analyze data
    • Interpret results and draw a conclusion

  • A statistical investigative question is required for any statistical study. A good statistical question will anticipate variability, will require data collection, and will not have a single definitive answer that can be easily looked up.
  • The population is the group of individuals or entities that our research question pertains to, and a parameter is a numerical summary measure that summarizes that population (e.g., the proportion who use social media).
  • A sample is a group of individuals or entities on which we collect data, and a statistic is a numerical summary measure of a sample.
  • We often collect data on different variables using survey questions, data can be quantitative or qualitative (also called categorical) depending on how it can be analyzed. Generally, quantitative data is the result of counting or measuring and qualitative data is the result of categorizing or describing. 
  • Simple random sampling assigns a number to every member of the population, then uses a random number generator to select a sample.
  • Other sampling techniques include:
    • Systematic sampling assigns a number to every member of the population, then chooses individuals/entities from the population at regular intervals (e.g. every 4th individual from a randomly selected starting point).
    • Stratified sampling divides a population into groups via some criterion, then uses simple random selection or systematic selection to collect a sample from each group.
    • Cluster sampling divides a population into groups via some criterion, then uses simple random selection or systematic selection to select one or more groups as the sample.
    • Convenience sampling selects a sample most accessible to the researcher.
  • A sampling method is unbiased if, on average, it results in a representative sample of the population. A sampling method is biased if it has a tendency to produce samples that are not representative of the population.
  • Here are the four main sources of bias to consider when sampling from a population:
    • Undercoverage occurs when some groups of the population are left out of the sampling process and the individuals in these groups do not have an equal chance of being selected for the sample.
    • Non-response bias occurs when an individual chosen for a sample cannot be contacted or decides to not participate in the study or research. This type of bias occurs after the sample has been selected and can create potential bias in the data collected.
    • Response bias is defined as a systemic pattern of inaccurate responses to questions. This type of bias can occur when a person does not understand a question or feels influenced to respond to a question in a certain way. Response bias can also occur as a result of the wording of questions that are of a sensitive nature.
    • A voluntary response bias is another form of bias because the sample is not random or representative of the population. The people who volunteer for a study or survey may be more inclined to respond to questions or report certain behaviors.

Glossary

biased

samples that are not representative of the population

cluster sampling

divides a population into groups via some criterion, then uses simple random selection or systematic selection to select one or more groups as the sample.

convenience sampling

a sample of individuals who are most accessible to the researcher. A convenience sample is usually not random or representative of the population

data

factual information about a group of individuals, animals, or objects

informed consent

risks of participation must be clearly explained to the subjects of the study

non-response bias

when an individual chosen for a sample cannot be contacted or decides to not participate in the study or research

observational units

the group of individuals, animals, or objects in the study

parameter

a numerical summary measure that summarizes that population

population

an entire group of people, objects, or animals; usually a large group

response bias

a systemic pattern of inaccurate responses to questions

sample

a randomly selected subset or subgroup of a population

sampling bias

when a sample is collected from a population and some members of the population are not as likely to be chosen as others

simple random sample

a random mechanism to choose a sample, without replacement, from the population so that every sample of a given size has the same chance of being selected

statistic

a numerical summary measure of a sample

statistical investigative question

a question that can be used as the starting point for an investigation that involves data collection and data analysis

stratified sampling

a population is divided into two or more groups (called strata) according to some criterion, and a sample is selected from each strata using simple random sampling or systematic sampling.

survey question

questions researchers ask in order to collect data, which is expected to vary from individual to individual

systematic sampling

every individual in the population is given a number and individuals/entities are chosen at regular intervals, with a random starting point

qualitative data, categorical data

categorizing or describing attributes of a population

quantitative continuous data

data that are not only made up of counting numbers, but that may include fractions, decimals, or irrational numbers

quantitative data

counting or measuring attributes of a population

quantitative discrete data

data that can take on only certain numerical values

unbiased

a representative sample of the population

undercoverage

when some groups of the population are left out of the sampling process and the individuals in these groups do not have an equal chance of being selected for the sample

variables

the characteristics of observational units

variability

the variance between data points

voluntary response bias

people who volunteer for a study or survey may be more inclined to respond to questions or report certain behaviors