While you’re likely familiar with the term “IQ” and associate it with the idea of intelligence, what does IQ really mean?
IQ
IQ stands for intelligence quotient and describes a score earned on a test designed to measure intelligence.
You’ve already learned that there are many ways psychologists describe intelligence (or more aptly, intelligences). Similarly, IQ tests—the tools designed to measure intelligence—have been the subject of debate throughout their development and use.
When might an IQ test be used? What do we learn from the results, and how might people use this information? While there are certainly many benefits to intelligence testing, it is important to also note the limitations and controversies surrounding these tests. For example, IQ tests have sometimes been used as arguments in support of insidious purposes, such as the eugenics movement (Severson, 2011). The infamous Supreme Court Case, Buck v. Bell, legalized the forced sterilization of some people deemed “feeble-minded” through this type of testing, resulting in about 65,000 sterilizations (Buck v. Bell, 274 U.S. 200; Ko, 2016). Today, only professionals trained in psychology can administer IQ tests, and the purchase of most tests requires an advanced degree in psychology. Other professionals in the field, such as social workers and psychiatrists, cannot administer IQ tests.
Why Measure Intelligence?
Candace, a 14-year-old girl experiencing problems at school in Connecticut, was referred for a court-ordered psychological evaluation. She was in regular education classes in ninth grade and was failing every subject. Candace had never been a stellar student but had always been passed to the next grade. Frequently, she would curse at any of her teachers who called on her in class. She also got into fights with other students and occasionally shoplifted. When she arrived for the evaluation, Candace immediately said that she hated everything about school, including the teachers, the rest of the staff, the building, and the homework. Her parents stated that they felt their daughter was picked on, because she was of a different race than the teachers and most of the other students. When asked why she cursed at her teachers, Candace replied, “They only call on me when I don’t know the answer. I don’t want to say, ‘I don’t know’ all of the time and look like an idiot in front of my friends. The teachers embarrass me.” She was given a battery of tests, including an IQ test. Her score on the IQ test was 68.
What does Candace’s score say about her ability to excel or even succeed in regular education classes without assistance? Why were her difficulties never noticed or addressed?
Measuring Intelligence

It seems that the human understanding of intelligence is somewhat limited when we focus on traditional or academic-type intelligence. How then, can intelligence be measured? And when we measure intelligence, how do we ensure that we capture what we’re really trying to measure (in other words, that IQ tests function as valid measures of intelligence)?
In the late 1800s, Sir Francis Galton developed the first broad test of intelligence (Flanagan & Kaufman, 2004). Although he was not a psychologist, his contributions to the concepts of intelligence testing are still felt today (Gordon, 1995). Reliable intelligence testing (you may recall from earlier modules that reliability refers to a test’s ability to produce consistent results) began in earnest during the early 1900s with a researcher named Alfred Binet. Binet was asked by the French government to develop an intelligence test to use on children to determine which ones might have difficulty in school; it included many verbally based tasks. American researchers soon realized the value of such testing.
Louis Terman, a Stanford professor, modified Binet’s work by standardizing the administration of the test and tested thousands of different-aged children to establish an average score for each age. As a result, the test was normed and standardized, which means that the test was administered consistently to a large enough representative sample of the population that the range of scores resulted in a bell curve (bell curves will be discussed later).
standardization and norms
Standardization means that the manner of administration of a test, its scoring, and the interpretation of results are all consistent.
Norming involves giving a test to a large population so data can be collected comparing groups, such as age groups. The resulting data provide norms, or referential scores, by which to interpret future scores.
Norms are not expectations of what a given group should know but a demonstration of what that group does know.
Norming and standardizing the test ensures that new scores are reliable. This new version of the test was called the Stanford-Binet Intelligence Scale (Terman, 1916). Remarkably, an updated version of this test is still widely used today.
Psychologist David Wechsler created a new IQ test in the US in 1939 by combining subtests from previous intelligence tests. These subtests tapped into a variety of verbal and nonverbal skills because Wechsler believed that intelligence encompassed “the global capacity of a person to act purposefully, to think rationally, and to deal effectively with his environment” (Wechsler, 1958, p. 7) He named it the Wechsler-Bellevue Intelligence Scale, which later was renamed and revised into the Wechsler Adult Intelligence Scale (WAIS). Today, there are three Wechsler tests: WAIS-IV (fourth edition), WISC-V (for children), and WPPSI-IV (for preschool and primary school). These tests are used widely in schools and communities throughout the United States, and they are periodically normed and standardized as a means of recalibration. As a part of the recalibration process, the WISC-V was given to thousands of children across the country, and children taking the test today are compared with their same-age peers (Figure 7.13).
The WISC-V is composed of 14 subtests, which comprise five indices, which then render an IQ score. The five indices are Verbal Comprehension, Visual-Spatial, Fluid Reasoning, Working Memory, and Processing Speed. When the test is complete, individuals receive a score for each of the five indices and a full scale IQ score. The method of scoring reflects the understanding that intelligence is comprised of multiple abilities in several cognitive realms and focuses on the mental processes that the child used to arrive at their answers to each test item.
How do you really measure intelligence?
Though many intelligence tests have been made that produce valid and reliable results, a good question to ask is simply, how should we go about measuring intelligence in the first place? Do you think asking a vocabulary question, for example, is a good measure of intelligence? If we are really attempting to measure the ability to learn from experience, for example, then the best types of tests would have people develop novel solutions to problems, but you can imagine how difficult it would be to design a test like that.
A major problem with IQ tests is that the questions are typically rooted in Western, middle-to-upper-class values, language, and experiences, which results in questions that are more easily understandable and relatable for individuals from these backgrounds, inadvertently favoring them and creating a bias against those from other cultural contexts. The wording and language used in many IQ tests may not be accessible to individuals for whom English is not a first language or who speak different dialects. Additionally, intelligence is multifaceted and does not fit neatly into the box that many standardized tests attempt to place it in. By not considering the diverse ways intelligence manifests across different cultures, traditional IQ testing can be limited in its scope and potentially discriminatory.
In some studies with Indigenous Australian participants, researchers gathered feedback about the types of culturally relevant test questions they could ask that would be clear. Some participants gave feedback that they would prefer to take the test outdoors, and that some of the terminology doesn’t translate well, like right-hand side or left-hand side.[1]
In one study, researchers asked community members about the tests beforehand and some of the tests were vetoed for being culturally irrelevant. Others were changed to be more culturally sensitive. For example, a question using abstract images replaced them with animals or grocery items, playing cards were replaced with stones or seashells.[2]
- Dingwall, K. M., Gray, A. O., McCarthy, A. R., Delima, J. F., & Bowden, S. C. (2017). Exploring the reliability and acceptability of cognitive tests for Indigenous Australians: a pilot study. BMC psychology, 5(1), 26. https://doi.org/10.1186/s40359-017-0195-y ↵
- Rock, D., Price, I.R. Identifying culturally acceptable cognitive tests for use in remote northern Australia. BMC Psychol 7, 62 (2019). https://doi.org/10.1186/s40359-019-0335-7 ↵