Perception and Illusions: Learn It 2—Multimodal Phenomena

Lumen Learning

Perception and Illusions: Learn It 2—Multimodal Phenomena

Multimodal Perception: How Our Senses Work Together

Although it has been traditional to study the various senses independently, most of the time, perception operates in the context of information supplied by multiple sensory modalities at the same time.

unimodal and multimodal perception

Unimodal perception is when we use information from just one sense, like seeing with our eyes or hearing with our ears. For example, when we look at a picture, we are using only our sense of vision.
Multimodal perception is when we use information from multiple senses at the same time to understand the world around us. For example, when we hear a sound and see where the sound is coming from, we are using both our sense of hearing and vision.

For example, imagine if you witnessed a car collision. You could describe the stimulus generated by this event by considering each of the senses independently; that is, as a set of unimodal stimuli. Your eyes would be stimulated with patterns of light energy bouncing off the cars involved. Your ears would be stimulated with patterns of acoustic energy emanating from the collision. Your nose might even be stimulated by the smell of burning rubber or gasoline.

Indeed, unless someone were to explicitly ask you to describe your perception in unimodal terms, you would most likely experience the event as a unified bundle of sensations from multiple senses. In other words, your perception would be multimodal. The question is whether the various sources of information involved in this multimodal stimulus are processed separately by the perceptual system or not.

For the last few decades, perceptual research has pointed to the importance of multimodal perception: the effects on the perception of events and objects in the world that are observed when there is information from more than one sensory modality. Most of this research indicates that, at some point in perceptual processing, information from the various sensory modalities is integrated. In other words, the information is combined and treated as a unitary representation of the world.

Why Multimodal Perception Matters

Recent neuroscience has identified many brain regions that respond to multiple types of sensory input, suggesting that humans are fundamentally multimodal perceivers (Spence, Senkowski, & Roder, 2009). This explains why:

Movie soundtracks feel emotionally powerful
Virtual reality seems immersive
Food tastes bland when you have a cold
Loud sounds appear “brighter,” and bright flashes appear “louder”

Our brains are wired to create one coherent world, not separate streams of vision, sound, and touch.

Behavioral Effects: Multimodal vs. Crossmodal Phenomena

Psychologists study two major categories of multisensory effects:

multimodal and crossmodal phenomena

Multimodal phenomena concern the binding together of inputs from multiple sensory modalities and the effects of this binding on perception.
Crossmodal phenomena concern the influence of one sensory modality on the perception of another (Spence, Senkowski, & Roder, 2009).

Multimodal Phenomena

Audiovisual Speech Perception

Speech is naturally multimodal: when someone talks, they produce sound waves and visual mouth movements. Watching a speaker’s lips can dramatically improve comprehension, especially in noisy environments.

Sumby and Pollack (1954) showed that in loud background noise, seeing the speaker’s mouth movements improves word recognition more than doubling the signal-to-noise ratio. In other words, watching the speaker can make speech clearer than simply turning up the volume.

One of the earliest investigations of this question examined the accuracy of recognizing spoken words presented in a noisy context, much like in the example above about talking at a crowded party. To study this phenomenon experimentally, some irrelevant noise (“white noise”—which sounds like a radio tuned between stations) was presented to participants. Embedded in the white noise were spoken words, and the participants’ task was to identify the words. There were two conditions: one in which only the auditory component of the words was presented (the “auditory-alone” condition), and one in both the auditory and visual components were presented (the “audiovisual” condition). The noise levels were also varied, so that on some trials, the noise was very loud relative to the loudness of the words, and on other trials, the noise was very soft relative to the words.

Most people assume that deaf individuals are much better at lipreading than individuals with normal hearing. It may come as a surprise to learn, however, that some individuals with normal hearing are also remarkably good at lipreading (sometimes called “speechreading”). In fact, there is a wide range of speechreading ability in both normal hearing and deaf populations (Andersson et al., 2001). However, the reasons for this wide range of performance are not well understood (Auer & Bernstein, 2007; Bernstein, 2006; Bernstein et al., 2001; Mohammed et al., 2005).

This improvement follows the Principle of Inverse Effectiveness: the brain benefits from multisensory information most when each individual sense is degraded. You might have noticed this phenomenon when turning captions on to watch a show.

Another phenomenon using audiovisual speech is a very famous illusion called the “McGurk effect” (named after one of its discoverers). In the classic formulation of the illusion, a movie is recorded of a speaker saying the syllables “gaga.” Another movie is made of the same speaker saying the syllables “baba.” Then, the auditory portion of the “baba” movie is dubbed onto the visual portion of the “gaga” movie. This combined stimulus is presented to participants, who are asked to report what the speaker in the movie said. McGurk and MacDonald (1976) reported that 98 percent of their participants reported hearing the syllable “dada”—which was in neither the visual nor the auditory components of the stimulus. These results indicate that when visual and auditory information about speech is integrated, it can have profound effects on perception.

You can view the transcript for “Try this bizarre audio illusion!” here (opens in new window).

Tactile/Visual Interactions in Body Ownership

Not all multisensory integration phenomena concern speech, however. One particularly compelling multisensory illusion involves the integration of tactile and visual information in the perception of body ownership.

In the “rubber hand illusion” (Botvinick & Cohen, 1998), an observer is situated so that one of his hands is not visible. A fake rubber hand is placed near the obscured hand, but in a visible location. The experimenter then uses a light paintbrush to simultaneously stroke the obscured hand and the rubber hand in the same locations. For example, if the middle finger of the obscured hand is being brushed, then the middle finger of the rubber hand will also be brushed. This sets up a correspondence between the tactile sensations (coming from the obscured hand) and the visual sensations (of the rubber hand).

After a short time (around 10 minutes), participants report feeling as though the rubber hand “belongs” to them; that is, that the rubber hand is a part of their body. This feeling can be so strong that surprising the participant by hitting the rubber hand with a hammer often leads to a reflexive withdrawal of the obscured hand—even though it is in no danger at all. It appears, then, that our awareness of our own bodies may be the result of multisensory integration.

See the rubber hand illusion in the following video.

You can view the transcript for “The Rubber Hand Illusion – Horizon: Is Seeing Believing? – BBC Two” here (opens in new window).

More Everyday Crossmodal Effects

Sound influences vision: A loud beep can make a visual flash seem brighter.
Vision influences touch: Watching your hand being touched can enhance tactile sensitivity.
Smell influences taste: Vanilla scent can make food taste sweeter.
Touch influences hearing: Feeling low-frequency vibrations helps us perceive bass in music.