Reinforcement: Learn It 3— Reinforcement Schedules

Remember, the best way to teach a person or animal a behavior is to use positive reinforcement. For example, Skinner used positive reinforcement to teach rats to press a lever in a Skinner box. At first, the rat might randomly hit the lever while exploring the box, and out would come a pellet of food. After eating the pellet, what do you think the hungry rat did next? It hit the lever again, and received another pellet of food. Each time the rat hit the lever, a pellet of food came out.

continuous reinforcement

When an organism receives a reinforcer each time it displays a behavior, it is called continuous reinforcement. This reinforcement schedule is the quickest way to teach someone a behavior, and it is especially effective in training a new behavior. Let’s look back at the dog that was learning to sit earlier in the module. Now, each time he sits, you give him a treat. Timing is important here: you will be most successful if you present the reinforcer immediately after he sits, so that he can make an association between the target behavior (sitting) and the consequence (getting a treat).

partial reinforcement

Once a behavior is trained, researchers and trainers often turn to another type of reinforcement schedule—partial reinforcement. In partial reinforcement, also referred to as intermittent reinforcement, the person or animal does not get reinforced every time they perform the desired behavior. There are several different types of partial reinforcement schedules (Table 1). These schedules are described as either fixed or variable, and as either interval or ratio:

  • Fixed refers to the number of responses between reinforcements, or the amount of time between reinforcements, which is set and unchanging.
  • Variable refers to the number of responses or amount of time between reinforcements, which varies or changes.
  • Interval means the schedule is based on the time between reinforcements.
  • Ratio means the schedule is based on the number of responses between reinforcements.

Schedules of Reinforcement

Now let’s combine these four terms.
  • A fixed interval reinforcement schedule is when behavior is rewarded after a set amount of time. For example, Maria is practicing her piano. After she practices for one hour, she is allowed to watch one episode of her favorite TV show. A reinforcement arrives after a consistent, set amount of time of the desired behavior has elapsed.
  • With a variable interval reinforcement schedule, the person or animal gets the reinforcement based on varying amounts of time, which are unpredictable. Say that Manuel is the manager at a fast-food restaurant. Every once in a while someone from the quality control division comes to Manuel’s restaurant. If the restaurant is clean and the service is fast, everyone on that shift earns a $20 bonus. Manuel never knows when the quality control person will show up, so he always tries to keep the restaurant clean and ensures that his employees provide prompt and courteous service. His productivity regarding prompt service and keeping a clean restaurant are steady because he wants his crew to earn the bonus. In this case, the reinforcement happens after seemingly random amounts of time of the desired behavior.
  • With a fixed ratio reinforcement schedule, there are a set number of responses that must occur before the behavior is rewarded. Carla sells glasses at an eyeglass store, and she earns a commission every time she sells five pairs of glasses. She always tries to sell people more pairs of glasses, including prescription sunglasses or a backup pair, so she can increase her commission. She does not care if the person really needs the prescription sunglasses, Carla just wants her bonus. The quality of what Carla sells does not matter because her commission is not based on quality; it’s only based on the number of pairs sold. In this case, a reinforcement is presented after a consistent, set number of instances of the desired behavior. Fixed ratios are better suited to optimize the quantity of output.
  • In a variable ratio reinforcement schedule, the number of responses needed for a reward varies. This is the most powerful partial reinforcement schedule. An example of the variable ratio reinforcement schedule is gambling. Imagine that Sarah—generally a smart, thrifty woman—visits Las Vegas for the first time. She is not a gambler, but out of curiosity she puts a quarter into the slot machine, and then another, and another. Nothing happens. Two dollars in quarters later, her curiosity is fading, and she is just about to quit. But then, the machine lights up, bells go off, and Sarah gets 50 quarters back. That’s more like it! Sarah gets back to inserting quarters with renewed interest, and a few minutes later she has used up all her gains and is $10 in the hole. Now might be a sensible time to quit. And yet, she keeps putting money into the slot machine because she never knows when the next reinforcement is coming. She keeps thinking that with the next quarter she could win $50, or $100, or even more. In this scenario, the reinforcement occurs after a seemingly random number of instances of the desired behavior. Because the reinforcement schedule in most types of gambling has a variable ratio schedule, people keep trying and hoping that the next time they will win big. This is one of the reasons that gambling is so addictive—and so resistant to extinction.

reinforcement schedules

 

Table 1. Reinforcement Schedules
Reinforcement Schedule Description Result Example
Fixed interval Reinforcement is delivered at predictable time intervals (e.g., after 5, 10, 15, and 20 minutes). Moderate response rate with significant pauses after reinforcement Reward for practice time
Variable interval Reinforcement is delivered at unpredictable time intervals (e.g., after 5, 7, 10, and 20 minutes). Moderate yet steady response rate Random quality checks
Fixed ratio Reinforcement is delivered after a predictable number of responses (e.g., after 2, 4, 6, and 8 responses). High response rate with pauses after reinforcement Piecework—factory worker getting paid for every x number of items manufactured
Variable ratio Reinforcement is delivered after an unpredictable number of responses (e.g., after 1, 4, 5, and 9 responses). High and steady response rate Gambling

Acquisition and Extinction

The terms acquisition and extinction also apply to operant conditioning. During the acquisition phase of operant conditioning, the organism learns to associate its behavior with a specific consequence. Once the behavior is acquired, subsequent processes such as maintenance, generalization, and extinction come into play. Maintenance refers to the continued performance of the behavior over time. Generalization involves applying the learned behavior to similar situations or stimuli. Extinction occurs when the behavior decreases or disappears due to the lack of reinforcement.

In operant conditioning, extinction of a reinforced behavior occurs at some point after reinforcement stops, and the speed at which this happens depends on the reinforcement schedule. In a variable ratio schedule, the point of extinction comes very slowly, as described above. But in the other reinforcement schedules, extinction may come quickly. Among the reinforcement schedules, variable ratio is the most productive and the most resistant to extinction. Fixed interval is the least productive and the easiest to extinguish (Figure 1).

Figure 1. The four reinforcement schedules yield different response patterns. The variable ratio schedule is unpredictable and yields high and steady response rates, with little if any pause after reinforcement (e.g., gambler). A fixed ratio schedule is predictable and produces a high response rate, with a short pause after reinforcement (e.g., eyeglass saleswoman). The variable interval schedule is unpredictable and produces a moderate, steady response rate (e.g., restaurant manager). The fixed interval schedule yields a scallop-shaped response pattern, reflecting a significant pause after reinforcement (e.g., piano student).