Reinforcement: Learn It 3— Reinforcement Schedules

Reinforcement Schedules

Positive reinforcement is the most effective way to teach a new behavior, but how reinforcement is delivered matters just as much as what the reward is. B. F. Skinner discovered that different reinforcement schedules lead to very different patterns of learning, motivation, and persistence.

continuous reinforcement

Continuous reinforcement occurs when a behavior is reinforced every single time it is performed. This schedule is:

  • Best for teaching new behaviors

  • Fastest for building a strong connection between behavior and reward

  • Most effective when reinforcement happens immediately

For example: When training a dog to sit, you give a treat every time it successfully sits. The dog quickly learns that “sit → treat.” Once the behavior is learned, however, continuous reinforcement becomes inefficient—and easy to extinguish—so trainers typically switch to a different schedule.

partial reinforcement

In partial reinforcement, the behavior is reinforced only some of the time. Partial reinforcement is more realistic in everyday life. We don’t get praised for every assignment, every good decision, or every correct answer—and yet we keep trying.

These schedules differ based on:

  • Interval vs. ratio → Is reinforcement based on time or number of responses?

  • Fixed vs. variable → Is the schedule predictable or unpredictable?

Schedules of Reinforcement

1. Fixed Interval (FI)

Behavior is reinforced after a set amount of time has passed. For example, Maria can watch one episode of her favorite show if she practices the piano for one hour.

Behavior pattern:

  • Moderate response rate
  • Noticeable “pause” after reinforcement
  • “Scalloped” pattern (work increases as the reward time approaches)

Common in:

  • Weekly paychecks
  • Scheduled quizzes
  • Office hours that open at a set time

2. Variable Interval (VI)

Behavior is reinforced after varying, unpredictable time intervals. For example, a restaurant crew earns a $20 bonus whenever a surprise quality-control inspector arrives. Because they never know when the inspector will show up, they keep the restaurant clean and service fast all the time.

Behavior pattern:

  • Steady, moderate response rate
  • No predictable pauses
  • Strong long-term maintenance

Common in:

  • Pop quizzes
  • Checking email or messages
  • Random performance checks

3. Fixed Ratio (FR)

Reinforcement is delivered after a predictable number of responses. For example, a salesperson earns a commission for every five pairs of glasses they sell. This encourages lots of sales—though not necessarily high-quality ones.

Behavior pattern:

  • High response rate
  • Short pause after reinforcement (“post-reinforcement break”)

Common in:

  • Piece-rate work
  • Buy-10-get-1-free cards
  • Completing sets (e.g., 20 practice questions = bonus points)

4. Variable Ratio (VR)

Reinforcement occurs after an unpredictable number of responses. This is the most powerful schedule—and the most resistant to extinction. For example, slot machines in Vegas operate on a VR schedule. You never know when the next pull will pay off, so you keep playing—even after losses.

Behavior pattern:

  • Very high, steady response rate
  • No predictable pauses
  • Extremely difficult to extinguish

Common in:

  • Gambling
  • Video game loot boxes
  • Social media (posting for unpredictable “likes”)

Variable Ratio Gambling

Imagine that Sarah visits Las Vegas for the first time. She is not a gambler, but out of curiosity, she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later, she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more.

In this scenario, the reinforcement occurs after a seemingly random number of instances of the desired behavior. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.

reinforcement schedules

 

Table 1. Reinforcement Schedules
Reinforcement Schedule Description Result Example
Fixed interval Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). Moderate response rate with significant pauses after reinforcement Reward for practice time
Variable interval Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). Moderate yet steady response rate Random quality checks
Fixed ratio Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). High response rate with pauses after reinforcement Piecework—factory worker getting paid for every x number of items manufactured
Variable ratio Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). High and steady response rate Gambling

Acquisition and Extinction in Operant Conditioning

Just like in classical conditioning, behaviors in operant conditioning follow predictable phases:

  • During the acquisition phase of operant conditioning, the organism learns to associate its behavior with a specific consequence. Once the behavior is acquired, subsequent processes such as maintenance, generalization, and extinction come into play.
    • Maintenance refers to the continued performance of the behavior over time.
    • Generalization involves applying the learned behavior to similar situations or stimuli.
    • Extinction occurs when the behavior decreases or disappears due to the lack of reinforcement.

In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish (Figure 1).

Figure 1. The four reinforcement schedules yield different response patterns. The variable ratio schedule is unpredictable and yields high and steady response rates, with little if any pause after reinforcement (e.g., gambler). A fixed ratio schedule is predictable and produces a high response rate, with a short pause after reinforcement (e.g., eyeglass saleswoman). The variable interval schedule is unpredictable and produces a moderate, steady response rate (e.g., restaurant manager). The fixed interval schedule yields a scallop-shaped response pattern, reflecting a significant pause after reinforcement (e.g., piano student).