Intelligence and Creativity: Learn It 3—Measuring Intelligence

While you’re likely familiar with the term “IQ” and associate it with the idea of intelligence, what does IQ really mean?

IQ

IQ stands for intelligence quotient and describes a score earned on a test designed to measure intelligence.

You’ve already learned that there are many ways psychologists describe intelligence (or more aptly, intelligences). Similarly, IQ tests—the tools designed to measure intelligence—have been the subject of debate throughout their development and use.

When might an IQ test be used? What do we learn from the results, and how might people use this information? While there are certainly many benefits to intelligence testing, it is important to also note the limitations and controversies surrounding these tests. For example, IQ tests have sometimes been used as arguments in support of insidious purposes, such as the eugenics movement (Severson, 2011). The infamous Supreme Court Case, Buck v. Bell, legalized the forced sterilization of some people deemed “feeble-minded” through this type of testing, resulting in about 65,000 sterilizations (Buck v. Bell, 274 U.S. 200; Ko, 2016). Today, only professionals trained in psychology can administer IQ tests, and the purchase of most tests requires an advanced degree in psychology. Other professionals in the field, such as social workers and psychiatrists, cannot administer IQ tests.

Why Measure Intelligence?

The value of IQ testing is most evident in educational or clinical settings. Children who seem to be experiencing learning difficulties or severe behavioral problems can be tested to ascertain whether the child’s difficulties can be partly attributed to an IQ score that is significantly different from the mean for her age group. Without IQ testing—or another measure of intelligence—children and adults needing extra support might not be identified effectively. In addition, IQ testing is used in courts to determine whether a defendant has special or extenuating circumstances that preclude him from participating in some way in a trial. People also use IQ testing results to seek disability benefits from the Social Security Administration. While IQ tests have sometimes been used as arguments in support of insidious purposes, such as the eugenics movement (Severson, 2011), the following case study demonstrates the usefulness and benefits of IQ testing.

Candace, a 14-year-old girl experiencing problems at school in Connecticut, was referred for a court-ordered psychological evaluation. She was in regular education classes in ninth grade and was failing every subject. Candace had never been a stellar student but had always been passed to the next grade. Frequently, she would curse at any of her teachers who called on her in class. She also got into fights with other students and occasionally shoplifted. When she arrived for the evaluation, Candace immediately said that she hated everything about school, including the teachers, the rest of the staff, the building, and the homework. Her parents stated that they felt their daughter was picked on, because she was of a different race than the teachers and most of the other students. When asked why she cursed at her teachers, Candace replied, “They only call on me when I don’t know the answer. I don’t want to say, ‘I don’t know’ all of the time and look like an idiot in front of my friends. The teachers embarrass me.” She was given a battery of tests, including an IQ test. Her score on the IQ test was 68.

What does Candace’s score say about her ability to excel or even succeed in regular education classes without assistance? Why were her difficulties never noticed or addressed?

Measuring Intelligence

Photograph A shows a portrait of Alfred Binet. Photograph B shows six sketches of human faces. Above these faces is the label “Guide for Binet-Simon Scale. 223” The faces are arranged in three rows of two, and these rows are labeled “1, 2, and 3.” At the bottom it reads: “The psychological clinic is indebted for the loan of these cuts and those on p. 225 to the courtesy of Dr. Oliver P. Cornman, Associate Superintendent of Schools of Philadelphia, and Chairman of Committee on Backward Children Investigation. See Report of Committee, Dec. 31, 1910, appendix.”
Figure 1. French psychologist Alfred Binet helped to develop intelligence testing. (b) This page is from a 1908 version of the Binet-Simon Intelligence Scale. Children being tested were asked which face, of each pair, was prettier.

It seems that the human understanding of intelligence is somewhat limited when we focus on traditional or academic-type intelligence. How then, can intelligence be measured? And when we measure intelligence, how do we ensure that we capture what we’re really trying to measure (in other words, that IQ tests function as valid measures of intelligence)?

In the late 1800s, Sir Francis Galton developed the first broad test of intelligence (Flanagan & Kaufman, 2004). Although he was not a psychologist, his contributions to the concepts of intelligence testing are still felt today (Gordon, 1995). Reliable intelligence testing (you may recall from earlier modules that reliability refers to a test’s ability to produce consistent results) began in earnest during the early 1900s with a researcher named Alfred Binet. Binet was asked by the French government to develop an intelligence test to use on children to determine which ones might have difficulty in school; it included many verbally based tasks. American researchers soon realized the value of such testing.

Louis Terman, a Stanford professor, modified Binet’s work by standardizing the administration of the test and tested thousands of different-aged children to establish an average score for each age. As a result, the test was normed and standardized, which means that the test was administered consistently to a large enough representative sample of the population that the range of scores resulted in a bell curve (bell curves will be discussed later).

standardization and norms

Standardization means that the manner of administration of a test, its scoring, and the interpretation of results are all consistent.

Norming involves giving a test to a large population so data can be collected comparing groups, such as age groups. The resulting data provide norms, or referential scores, by which to interpret future scores.

Norms are not expectations of what a given group should know but a demonstration of what that group does know.

Norming and standardizing the test ensures that new scores are reliable. This new version of the test was called the Stanford-Binet Intelligence Scale (Terman, 1916). Remarkably, an updated version of this test is still widely used today.

Psychologist David Wechsler created a new IQ test in the US in 1939 by combining subtests from previous intelligence tests. These subtests tapped into a variety of verbal and nonverbal skills because Wechsler believed that intelligence encompassed “the global capacity of a person to act purposefully, to think rationally, and to deal effectively with his environment” (Wechsler, 1958, p. 7) He named it the Wechsler-Bellevue Intelligence Scale, which later was renamed and revised into the Wechsler Adult Intelligence Scale (WAIS). Today, there are three Wechsler tests: WAIS-IV (fourth edition), WISC-V (for children), and WPPSI-IV (for preschool and primary school). These tests are used widely in schools and communities throughout the United States, and they are periodically normed and standardized as a means of recalibration. As a part of the recalibration process, the WISC-V was given to thousands of children across the country, and children taking the test today are compared with their same-age peers (Figure 7.13).

The WISC-V is composed of 14 subtests, which comprise five indices, which then render an IQ score. The five indices are Verbal Comprehension, Visual-Spatial, Fluid Reasoning, Working Memory, and Processing Speed. When the test is complete, individuals receive a score for each of the five indices and a full scale IQ score. The method of scoring reflects the understanding that intelligence is comprised of multiple abilities in several cognitive realms and focuses on the mental processes that the child used to arrive at their answers to each test item.

How do you really measure intelligence?

Though many intelligence tests have been made that produce valid and reliable results, a good question to ask is simply, how should we go about measuring intelligence in the first place? Do you think asking a vocabulary question, for example, is a good measure of intelligence? If we are really attempting to measure the ability to learn from experience, for example, then the best types of tests would have people develop novel solutions to problems, but you can imagine how difficult it would be to design a test like that.

A major problem with IQ tests is that the questions are typically rooted in Western, middle-to-upper-class values, language, and experiences, which results in questions that are more easily understandable and relatable for individuals from these backgrounds, inadvertently favoring them and creating a bias against those from other cultural contexts. The wording and language used in many IQ tests may not be accessible to individuals for whom English is not a first language or who speak different dialects. Additionally, intelligence is multifaceted and does not fit neatly into the box that many standardized tests attempt to place it in. By not considering the diverse ways intelligence manifests across different cultures, traditional IQ testing can be limited in its scope and potentially discriminatory.

In some studies with Indigenous Australian participants, researchers gathered feedback about the types of culturally relevant test questions they could ask that would be clear. Some participants gave feedback that they would prefer to take the test outdoors, and that some of the terminology doesn’t translate well, like right-hand side or left-hand side.[1]

In one study, researchers asked community members about the tests beforehand and some of the tests were vetoed for being culturally irrelevant. Others were changed to be more culturally sensitive. For example, a question using abstract images replaced them with animals or grocery items, playing cards were replaced with stones or seashells.[2]

Interestingly, the periodic recalibrations have led to an interesting observation known as the Flynn effect. Named after James Flynn, who was among the first to describe this trend, the Flynn effect refers to the observation that each generation has a significantly higher IQ than the last. Flynn himself argues, however, that increased IQ scores do not necessarily mean that younger generations are more intelligent per se (Flynn, Shaughnessy, & Fulgham, 2012). Ultimately, we are still left with the question of how valid intelligence tests are. Certainly, the most modern versions of these tests tap into more than verbal competencies, yet the specific skills that should be assessed in IQ testing, the degree to which any test can truly measure an individual’s intelligence, and the use of the results of IQ tests are still issues of debate (Gresham & Witt, 1997; Flynn, Shaughnessy, & Fulgham, 2012; Richardson, 2002; Schlinger, 2003).


  1. Dingwall, K. M., Gray, A. O., McCarthy, A. R., Delima, J. F., & Bowden, S. C. (2017). Exploring the reliability and acceptability of cognitive tests for Indigenous Australians: a pilot study. BMC psychology, 5(1), 26. https://doi.org/10.1186/s40359-017-0195-y
  2. Rock, D., Price, I.R. Identifying culturally acceptable cognitive tests for use in remote northern Australia. BMC Psychol 7, 62 (2019). https://doi.org/10.1186/s40359-019-0335-7