Reinforcement: Learn It 3— Reinforcement Schedules

Lumen Learning

Reinforcement: Learn It 3— Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is the most effective way to teach a new behavior, but how reinforcement is delivered matters just as much as what the reward is. B. F. Skinner discovered that different reinforcement schedules lead to very different patterns of learning, motivation, and persistence.

continuous reinforcement

Continuous reinforcement occurs when a behavior is reinforced every single time it is performed. This schedule is:

Best for teaching new behaviors
Fastest for building a strong connection between behavior and reward
Most effective when reinforcement happens immediately

For example: When training a dog to sit, you give a treat every time it successfully sits. The dog quickly learns that “sit → treat.” Once the behavior is learned, however, continuous reinforcement becomes inefficient—and easy to extinguish—so trainers typically switch to a different schedule.

partial reinforcement

In partial reinforcement, the behavior is reinforced only some of the time. Partial reinforcement is more realistic in everyday life. We don’t get praised for every assignment, every good decision, or every correct answer—and yet we keep trying.

These schedules differ based on:

Interval vs. ratio → Is reinforcement based on time or number of responses?
Fixed vs. variable → Is the schedule predictable or unpredictable?

Schedules of Reinforcement

1. Fixed Interval (FI)

Behavior is reinforced after a set amount of time has passed. For example, Maria can watch one episode of her favorite show if she practices the piano for one hour.

Behavior pattern:

Moderate response rate
Noticeable “pause” after reinforcement
“Scalloped” pattern (work increases as the reward time approaches)

Common in:

Weekly paychecks
Scheduled quizzes
Office hours that open at a set time

2. Variable Interval (VI)

Behavior is reinforced after varying, unpredictable time intervals. For example, a restaurant crew earns a $20 bonus whenever a surprise quality-control inspector arrives. Because they never know when the inspector will show up, they keep the restaurant clean and service fast all the time.

Behavior pattern:

Steady, moderate response rate
No predictable pauses
Strong long-term maintenance

Common in:

Pop quizzes
Checking email or messages
Random performance checks

3. Fixed Ratio (FR)

Reinforcement is delivered after a predictable number of responses. For example, a salesperson earns a commission for every five pairs of glasses they sell. This encourages lots of sales—though not necessarily high-quality ones.

Behavior pattern:

High response rate
Short pause after reinforcement (“post-reinforcement break”)

Common in:

Piece-rate work
Buy-10-get-1-free cards
Completing sets (e.g., 20 practice questions = bonus points)

4. Variable Ratio (VR)

Reinforcement occurs after an unpredictable number of responses. This is the most powerful schedule—and the most resistant to extinction. For example, slot machines in Vegas operate on a VR schedule. You never know when the next pull will pay off, so you keep playing—even after losses.

Behavior pattern:

Very high, steady response rate
No predictable pauses
Extremely difficult to extinguish

Common in:

Gambling
Video game loot boxes
Social media (posting for unpredictable “likes”)

Variable Ratio Gambling

Imagine that Sarah visits Las Vegas for the first time. She is not a gambler, but out of curiosity, she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later, she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more.

In this scenario, the reinforcement occurs after a seemingly random number of instances of the desired behavior. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.

reinforcement schedules

Table 1. Reinforcement Schedules
Reinforcement Schedule	Description	Result	Example
Fixed interval	Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes).	Moderate response rate with significant pauses after reinforcement	Reward for practice time
Variable interval	Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes).	Moderate yet steady response rate	Random quality checks
Fixed ratio	Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses).	High response rate with pauses after reinforcement	Piecework—factory worker getting paid for every x number of items manufactured
Variable ratio	Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses).	High and steady response rate	Gambling

Acquisition and Extinction in Operant Conditioning

Just like in classical conditioning, behaviors in operant conditioning follow predictable phases:

During the acquisition phase of operant conditioning, the organism learns to associate its behavior with a specific consequence. Once the behavior is acquired, subsequent processes such as maintenance, generalization, and extinction come into play.
- Maintenance refers to the continued performance of the behavior over time.
- Generalization involves applying the learned behavior to similar situations or stimuli.
- Extinction occurs when the behavior decreases or disappears due to the lack of reinforcement.

In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish (Figure 1).