Statistics: Cheat Sheet

Lumen Learning

Statistics: Cheat Sheet

Download a PDF of this page here.

Download the Spanish version here.

Essential Concepts

Population refers to the group of people or objects that researchers want to learn about.
A parameter is a number that summarizes a characteristic of the entire population. It can be an average, percentage, or any other value that is calculated using data from the whole population.
A census is a type of survey where data is collected from every single member of the population. It’s like counting or surveying every person or object in a group.
When we want to study a big group of people but can’t survey everyone, we choose a smaller group called a sample. The sample should represent the whole group.
A statistic is a number that tells us something about a smaller group called a sample.
Observational units are the group of individuals, animals, or objects that are being studied or surveyed in a research study. They are the ones we want to learn about and collect information from.
Variables are the different characteristics or qualities of the observational units that we measure or record in a study.
Data can be divided into two types: qualitative and quantitative.
- Qualitative data describes qualities or attributes of a group, like hair color or blood type. It uses words or categories to describe these attributes, such as black hair or blood type AB+.
- Quantitative data involves counting or measuring things. It uses numbers to represent information, like the amount of money you have or the weight of an object. It can be further divided into discrete data (counting whole numbers) or continuous data (numbers including fractions or decimals), such as the number of phone calls you receive in a day or the length of those calls.
When collecting a sample, it’s important to choose it in a way that represents the whole group. Sampling bias happens when some members of the population are more likely to be chosen than others, which can lead to wrong conclusions about the entire group. Random samples are preferred because they have no bias, but even random samples can vary and may not perfectly represent the population.
Simple random sampling means that when selecting a sample, every individual or entity in the population has an equal and fair chance of being chosen. It ensures that each member and any group from the population has an equal probability of being selected for the sample.
In systematic sampling, each person or object in the population is assigned a number. Then, we choose individuals at regular intervals, like every [latex]5th[/latex] or [latex]10th[/latex] person, starting from a randomly selected point. This way, we ensure that every “n”th member of the population is included in the sample.
Stratified sampling is when a population is split into different groups based on certain criteria, like location or age. Then, a sample is chosen from each group using methods like random selection, but the size of each sample is based on the size of the group in the population. This helps ensure that each subgroup is represented properly in the sample.
- Quota sampling is a modified version of stratified sampling where samples are collected from each subgroup until a specific target or quota is reached.
Cluster sampling is a method where instead of selecting individual people or objects, the population is divided into smaller groups called clusters. Then, a few of these clusters are randomly chosen to be part of the sample for the study.
Convenience samples and voluntary response samples are considered among the least reliable sampling methods.
- Convenience sampling is when samples are chosen based on who is readily available or convenient to include.
- Voluntary response sampling is a method where individuals choose to participate in the sample on their own accord.
There are several ways a study can be biased even before collecting data. One way is through sampling bias, where the sample is not representative of the whole group. Another type is voluntary response bias, which happens when data is collected only from volunteers, leading to an unbalanced representation. Other biases can come from researchers having an interest in the outcome, participants giving inaccurate responses, fear of not being anonymous, question wording influencing answers, people refusing to participate, or leaving out certain groups from the study.
Observational studies involve observing and measuring, while experiments involve measuring the effects of a treatment.
Confounding happens when there are two possible factors that could have caused a result, but we can’t tell which one is actually responsible.
The placebo effect occurs when a person’s belief in a treatment affects its effectiveness, even if the treatment itself doesn’t have any real impact. To account for this, a placebo, which is a fake treatment, is often used as a comparison in studies.
In blind studies, participants are unaware if they are receiving the actual treatment or a placebo. In double-blind studies, even the people interacting with the participants don’t know who is in which group (treatment or control).
Experimental design consists of two key components: the factor of interest, which is the variable we think has an impact, and the response variable, which is the variable we believe is influenced by the factor of interest.
Randomized block design is a method used in experiments where similar subjects are grouped into blocks, each differing in ways that might affect the outcome. Nuisance factors can be controlled by adding them to the experimental design, and blocking refers to grouping similar subjects together and randomly assigning them to different treatments within each group.

Glossary

blind study

one in which the participant does not know whether or not they are receiving the treatment or a placebo

block

a group of subjects that are similar

blocking

the grouping together of homogeneous (similar) experimental units followed by the random assignment of the experimental units within each group to a treatment

census

a survey of an entire population

cluster sampling

where the population is divided into subgroups (clusters) and a set of subgroups are selected to be in the sample

confounding

when there are two potential variables that could have caused the outcome and it is not possible to determine which actually caused the result

control group

group that does not receive the treatment of interest or the placebo

convenience sampling

the practice of samples chosen by selecting whoever is convenient

double-blind study

one in which those interacting with the participants don’t know who is in the treatment group and who is in the control group

experiment

a study in which the effects of a treatment are measured

experimental group

group that receives the treatment of interest

experimental unit

single object or individual to be measured in the experiment

factor of interest

the explanatory variable (independent variable), which is what we suspect has an effect on the response variable

loaded questions

when the question wording influences the responses

non-response bias

when people refusing to participate in the study can influence the validity of the outcome

observational study

a study based on observations or measurements

observational units

the group of individuals, animals, or objects who are being measured or surveyed in a study

parameter

a value (average, percentage, etc.) calculated using all the data from a population

perceived lack of anonymity

when the responder fears giving an honest answer might negatively affect them

placebo

a dummy treatment given to control for the placebo effect

placebo effect

when the effectiveness of a treatment is influenced by the patient’s perception of how effective they think the treatment will be, so a result might be seen even if the treatment is ineffectual

population

the group the collected data is intended to describe

qualitative data

the result of categorizing or describing attributes of a population

quantitative data

the result of counting or measuring attributes of a population

quota sampling

where samples are collected in each subgroup until the desired quota is met

random sample

where each member of the population has an equal probability of being chosen

response bias

when the responder gives inaccurate responses for any reason

response factor

the dependent variable, which we suspect is affected by the factor of interest

sample

a smaller subset of the entire population, ideally one that is fairly representative of the whole population

sampling bias

when a sample is collected from a population and some members of the population are not as likely to be chosen as others

self-interest study

bias that can occur when the researchers have an interest in the outcome

simple random sample

where every member of the population and any group of members has an equal probability of being chosen

statistic

a value (average, percentage, etc.) calculated using the data from a sample

stratified sampling

where random samples are taken from each subgroup (or strata) with sample sizes proportional to the size of the subgroup in the population.

systematic sampling

every [latex]n[/latex]^th member of the population is selected to be in the sample

undercoverage

occurs when some groups of the population are left out of the sampling process

variables

the characteristics of the observational units in a study

voluntary response bias

the sampling bias that often occurs when the sample is volunteers

voluntary response sampling

allowing the sample to volunteer