Intelligence and Creativity: Learn It 3—Measuring Intelligence

Lumen Learning

Intelligence and Creativity: Learn It 3—Measuring Intelligence

Understanding IQ

While you’re likely familiar with the term “IQ” and associate it with the idea of intelligence, what does IQ really mean?

iq

IQ stands for intelligence quotient and describes a score earned on a test designed to measure intelligence.

Modern intelligence researchers emphasize that there are many kinds of intelligence, not all of which are captured by traditional IQ tests. Still, IQ testing remains widely used in education, clinical psychology, and legal settings. Because of the potential impact of these scores, only licensed psychologists are permitted to administer most standardized IQ tests—social workers, teachers, and psychiatrists generally cannot.

IQ testing has a complicated history. Although many people benefit from accurate cognitive assessments, IQ tests have also been misused, most famously in support of the early 20th-century eugenics movement. In Buck v. Bell (1927), the U.S. Supreme Court upheld laws permitting forced sterilization of people labeled “feeble-minded,” often on the basis of biased intelligence testing—ultimately resulting in approximately 65,000 sterilizations (Ko, 2016). Understanding both the value and the limits of IQ tests is essential.

Why Measure Intelligence?

Despite historical misuse, IQ tests can serve important purposes when used ethically:

Educational settings—IQ tests can help identify:

learning disabilities or learning differences
students who may need special education services
gifted or advanced learners who need enrichment

Without assessments, many children who struggle or excel might go unnoticed.

Clinical and legal settings—IQ scores may help determine:

whether someone qualifies for disability services
whether a defendant can meaningfully participate in a trial
whether cognitive impairment plays a role in behavior

Real World Example

Candace, a 14-year-old in Connecticut, was failing all her classes, frequently fighting with peers, and cursing at teachers. She felt singled out and embarrassed in class and believed teachers only called on her when she didn’t know the answer. A psychological evaluation—including an IQ test—revealed her score was 68, far below the average range.

Candace’s difficulties were not due to lack of effort or motivation; she had likely needed academic support for years. IQ testing helped educators and clinicians finally understand the root of her struggles and identify appropriate interventions.

What does Candace’s score say about her ability to excel or even succeed in regular education classes without assistance? Why were her difficulties never noticed or addressed?

Measuring Intelligence

Photograph A shows a portrait of Alfred Binet. Photograph B shows six sketches of human faces. Above these faces is the label “Guide for Binet-Simon Scale. 223” The faces are arranged in three rows of two, and these rows are labeled “1, 2, and 3.” At the bottom it reads: “The psychological clinic is indebted for the loan of these cuts and those on p. 225 to the courtesy of Dr. Oliver P. Cornman, Associate Superintendent of Schools of Philadelphia, and Chairman of Committee on Backward Children Investigation. See Report of Committee, Dec. 31, 1910, appendix.” — **Figure 1**. French psychologist Alfred Binet helped to develop intelligence testing. (b) This page is from a 1908 version of the Binet-Simon Intelligence Scale. Children being tested were asked which face, of each pair, was prettier.

It seems that the human understanding of intelligence is somewhat limited when we focus on traditional or academic-type intelligence. How then, can intelligence be measured? And when we measure intelligence, how do we ensure that we capture what we’re really trying to measure (in other words, that IQ tests function as valid measures of intelligence)?

Early History

In the late 1800s, Sir Francis Galton developed the first broad test of intelligence (Flanagan & Kaufman, 2004). Although he was not a psychologist, his contributions to the concepts of intelligence testing are still felt today (Gordon, 1995).

Reliable intelligence testing (you may recall from earlier modules that reliability refers to a test’s ability to produce consistent results) began in earnest during the early 1900s with a researcher named Alfred Binet.

Binet was asked by the French government to develop an intelligence test to use on children to determine which ones might have difficulty in school; it included many verbally based tasks. American researchers soon realized the value of such testing.

Louis Terman, a Stanford professor, modified Binet’s work by standardizing the administration of the test and tested thousands of different-aged children to establish an average score for each age. As a result, the test was normed and standardized, which means that the test was administered consistently to a large enough representative sample of the population that the range of scores resulted in a bell curve (bell curves will be discussed later).

standardization and norms

Standardization: Ensuring that test instructions, scoring, and interpretations are given in a consistent way.
Norming: Administering a test to a large, representative sample to determine typical performance levels for different age groups.

These steps produce norms, which allow examiners to interpret individual scores in comparison to others of the same age. Terman’s version became the Stanford–Binet Intelligence Scale, still used today in updated form.

Norming and standardizing the test ensures that new scores are reliable. This new version of the test was called the Stanford-Binet Intelligence Scale (Terman, 1916). Remarkably, an updated version of this test is still widely used today.

Psychologist David Wechsler created a new IQ test in the US in 1939 by combining subtests from previous intelligence tests. These subtests tapped into a variety of verbal and nonverbal skills because Wechsler believed that intelligence encompassed “the global capacity of a person to act purposefully, to think rationally, and to deal effectively with his environment” (Wechsler, 1958, p. 7) He named it the Wechsler-Bellevue Intelligence Scale, which later was renamed and revised into the Wechsler Adult Intelligence Scale (WAIS).

Today, three main Wechsler IQ tests are widely used in clinical and school settings:

WAIS-5 – Wechsler Adult Intelligence Scale, Fifth Edition (for ages 16–90)
WISC-V – Wechsler Intelligence Scale for Children (for ages 6–16)
WPPSI-IV – Wechsler Preschool and Primary Scale of Intelligence (for younger children)

All three tests are individually administered by trained professionals and are regularly updated (“revised editions”) to keep norms current and items fair and accurate.

These tests are used widely in schools and communities throughout the United States, and they are periodically normed and standardized as a means of recalibration. For example, as a part of the recalibration process, the WISC-V was given to thousands of children across the country, and children taking the test today are compared with their same-age peers.

The WISC-V is composed of 14 subtests, which comprise five indices, which then render an IQ score. The five indices are:

Verbal Comprehension (e.g., understanding and using language)
Visual–Spatial Ability (e.g., analyzing and reproducing designs)
Fluid Reasoning (e.g., solving novel visual and logical problems)
Working Memory (e.g., holding and manipulating information in mind)
Processing Speed (e.g., quickly scanning and marking symbols)

When the test is complete, individuals receive a score for each of the five indices and a full scale IQ score. The method of scoring reflects the understanding that intelligence is comprised of multiple abilities in several cognitive realms and focuses on the mental processes that the child used to arrive at their answers to each test item.

How do you really measure intelligence?

Though many intelligence tests have been made that produce valid and reliable results, a good question to ask is simply, how should we go about measuring intelligence in the first place? Do you think asking a vocabulary question, for example, is a good measure of intelligence? If we are really attempting to measure the ability to learn from experience, for example, then the best types of tests would have people develop novel solutions to problems, but you can imagine how difficult it would be to design a test like that.

Cultural Biases in Testing

A major problem with IQ tests is that the questions are typically rooted in Western, middle-to-upper-class values, language, and experiences, which results in questions that are more easily understandable and relatable for individuals from these backgrounds, inadvertently favoring them and creating a bias against those from other cultural contexts.

The wording and language used in many IQ tests may not be accessible to individuals for whom English is not a first language or who speak different dialects. Additionally, intelligence is multifaceted and does not fit neatly into the box that many standardized tests attempt to place it in. By not considering the diverse ways intelligence manifests across different cultures, traditional IQ testing can be limited in its scope and potentially discriminatory.

In some studies with Indigenous Australian participants, researchers gathered feedback about the types of culturally relevant test questions they could ask that would be clear. Some participants gave feedback that they would prefer to take the test outdoors, and that some of the terminology doesn’t translate well, like right-hand side or left-hand side.^[1]

In one study, researchers asked community members about the tests beforehand, and some of the tests were vetoed for being culturally irrelevant. Others were changed to be more culturally sensitive. For example, a question using abstract images replaced them with animals or grocery items, playing cards were replaced with stones or seashells.^[2]

The Flynn Effect

Periodic updates to test norms revealed a surprising pattern: average IQ scores have increased with each generation, a trend known as the Flynn effect.

Although scores rise, James Flynn argued this does not mean people are innately “more intelligent” today. Instead, changes in education, technology, nutrition, and daily problem-solving demands may make each generation more practiced at the kinds of thinking IQ tests measure (Flynn, Shaughnessy, & Fulgham, 2012).

Are IQ Tests Valid?

Modern IQ tests are:

reliable (they produce consistent scores)
useful for identifying certain cognitive strengths and weaknesses
helpful in educational and clinical decision-making

However, debates remain about:

which abilities should “count” as intelligence
how culture and language shape performance
how IQ scores should—and should not—be used
whether any test can fully capture the richness of human intelligence

In short, IQ tests provide valuable information, but they are not a complete picture of intelligence.

Dingwall, K. M., Gray, A. O., McCarthy, A. R., Delima, J. F., & Bowden, S. C. (2017). Exploring the reliability and acceptability of cognitive tests for Indigenous Australians: a pilot study. BMC psychology, 5(1), 26. https://doi.org/10.1186/s40359-017-0195-y ↵
Rock, D., Price, I.R. Identifying culturally acceptable cognitive tests for use in remote northern Australia. BMC Psychol 7, 62 (2019). https://doi.org/10.1186/s40359-019-0335-7 ↵