Statistical Thinking: Learn It 3—Reliability and Validity

Lumen Learning

Statistical Thinking: Learn It 3—Reliability and Validity

Reliability and Validity

Reliability and validity are two important considerations that must be made with any type of data collection.

reliability

Reliability refers to the ability to consistently produce a given result. In the context of psychological research, this would mean that any instruments or tools used to collect data do so in consistent, reproducible ways. There are a number of different types of reliability. Some of these include:

inter-rater reliability: the degree to which two or more different observers agree on what has been observed
internal consistency: the degree to which different items on a survey that measure the same thing correlate with one another
test-retest reliability: the degree to which the outcomes of a particular measure remain consistent over multiple administrations

Unfortunately, being consistent in measurement does not necessarily mean that you have measured something correctly. To illustrate this concept, consider a kitchen scale that would be used to measure the weight of cereal that you eat in the morning. If the scale is not properly calibrated, it may consistently under- or overestimate the amount of cereal that’s being measured. While the scale is highly reliable in producing consistent results (e.g., the same amount of cereal poured onto the scale produces the same reading each time), those results are incorrect. This is where validity comes into play.

validity

Validity refers to the extent to which a given instrument or tool accurately measures what it’s supposed to measure, and once again, there are a number of ways in which validity can be expressed. A few types of validity that researchers consider are below (though there are lots more!):

criterion validity: the degree to which the results actually measure the outcome they were designed to measure (results are often compared with other outcomes)
ecological validity: the degree to which research results generalize to real-world applications (for example, was the setting of the experiment close enough to the real-world setting?)

While any valid measure is by necessity reliable, the reverse is not necessarily true. Researchers strive to use instruments that are both highly reliable and valid.