What Are Schedules of Reinforcement in Psychology?

Schedules of reinforcement are rules that determine when and how often a behavior gets rewarded. They’re one of the most important concepts in behavioral psychology because they explain not just whether a behavior will continue, but how frequently and how persistently someone (or some animal) will perform it. The core insight: it’s not just the reward itself that shapes behavior, but the pattern of delivery.

Continuous vs. Intermittent Reinforcement

The broadest distinction is between continuous and intermittent reinforcement. With continuous reinforcement, every single correct response earns a reward. Press the lever, get a treat. Every time. This is the fastest way to teach a new behavior because the connection between action and outcome is unmistakable.

Intermittent reinforcement means only some responses are rewarded, following a specific pattern. This is where things get interesting. Behaviors learned through intermittent reinforcement are far more persistent than those learned through continuous reinforcement. This is called the partial reinforcement extinction effect: when rewards stop entirely, a behavior trained with intermittent reinforcement takes much longer to fade away. One reason is straightforward. If you’re used to being rewarded every single time, the absence of a reward is immediately noticeable and signals that something has changed. But if you’re used to going unrewarded sometimes, a stretch without rewards feels normal. You keep going.

There’s also a deeper mechanism at work. During intermittent training, your brain learns that unrewarded attempts are often followed by rewarded ones. The experience of not getting a reward actually becomes a cue to keep trying, because historically, persistence has paid off.

The Four Basic Schedules

Intermittent reinforcement breaks down into four classic schedules, organized along two dimensions: whether the reward depends on the number of responses or the passage of time, and whether that requirement is fixed or unpredictable.

Fixed Ratio

A reward comes after a set number of responses that never changes. Think of a factory worker paid for every 10 widgets assembled, or a coffee shop punch card where the 10th drink is free. You know exactly how much work is required, so you tend to work quickly to reach the target. One quirk of this schedule is a brief pause right after receiving the reward before starting the next round. The larger the ratio (the more responses required), the longer that post-reward pause tends to be.

Variable Ratio

A reward comes after a certain number of responses, but that number changes each time. On a variable ratio schedule averaging five responses, you might be rewarded after three attempts, then eight, then two, then seven. Because you can never predict which response will pay off, you tend to respond at a high, steady rate with very little pausing. This is the schedule behind slot machines: every pull could be the winning one, so players keep pulling. It’s also what makes scrolling social media so engaging. You don’t know when you’ll land on a post or notification that gives you a little hit of satisfaction, so you keep scrolling. Variable ratio schedules produce the highest and most consistent response rates of all four schedules, and behaviors maintained on them are the most resistant to extinction.

Fixed Interval

A reward becomes available after a set amount of time has passed, given that you respond at least once after the interval ends. The key difference from ratio schedules is that extra responses during the waiting period don’t speed anything up. This produces a distinctive pattern called “scalloping”: responding slows down or stops right after a reward, then gradually accelerates as the next reward window approaches. Checking your mailbox is a rough example. Mail arrives once a day at a predictable time, so you don’t bother checking at midnight, but you start checking more frequently as delivery time nears. Students cramming harder as an exam approaches follows a similar curve.

Variable Interval

A reward becomes available after a varying amount of time, and you earn it with the first response after that time has passed. On a variable interval schedule averaging five minutes, the actual intervals might be two minutes, then nine, then three, then six. Because you can’t predict when the next reward window opens, you tend to respond at a slow but steady rate. Checking your email throughout the day is a common example. Messages arrive at unpredictable times, so you check at a moderate, fairly consistent pace.

How Response Patterns Compare

Each schedule generates a recognizably different pattern of behavior. Ratio schedules, where the reward depends on how much you do, generally produce higher response rates than interval schedules, where the reward depends on when you respond. This makes intuitive sense: if more effort means faster rewards, you’re motivated to work faster. If rewards are time-locked, extra effort is wasted.

Variable schedules produce steadier responding than fixed schedules. Fixed schedules create predictable lulls, either the post-reward pause in fixed ratio or the scalloped slowdown in fixed interval. Variable schedules eliminate these lulls because the next reward could arrive at any moment. Putting the two dimensions together, variable ratio produces the fastest, most consistent behavior. Variable interval produces slow but steady behavior. Fixed ratio produces fast bursts with pauses. Fixed interval produces the classic scallop.

Research on fixed interval schedules in particular has shown that regardless of what an animal was trained on previously, exposure to a fixed interval schedule eventually produces the same positively accelerated response curve. The scallop pattern is remarkably robust. Studies also confirm that shorter intervals produce higher overall response rates than longer ones, which makes sense: more frequent reward opportunities mean more motivation to respond.

The Matching Law

When two or more reward sources are available at the same time, organisms don’t just pick the better one exclusively. Instead, they distribute their behavior proportionally. If one option delivers twice as many rewards as another, roughly twice as much effort goes toward it. This principle, identified by psychologist Richard Herrnstein in the early 1960s, holds across species and situations with striking consistency.

The practical implication is powerful. If you want to understand why someone spends their time one way versus another, look at the relative reinforcement each option provides. A child splitting time between homework and video games isn’t making a single choice. They’re continuously distributing behavior toward whichever source provides denser, more immediate rewards. Changing that distribution means changing the relative reinforcement, not just punishing the less desirable option.

Real-World Applications

Schedules of reinforcement aren’t just laboratory abstractions. They operate constantly in everyday life, often by design.

Gambling is the most frequently cited example. Slot machines operate on a variable ratio schedule, delivering payouts after an unpredictable number of plays. This is precisely why they’re so difficult to walk away from. Each play could be the one that pays, and the behavior is highly resistant to extinction because the player has learned that dry spells are normal and persistence eventually pays off.

Social media platforms exploit the same principle. Scrolling through a feed, checking for likes, or refreshing notifications all follow a variable ratio pattern. Rewards (interesting content, social validation) appear unpredictably, keeping you engaged in a steady stream of checking behavior. App designers understand these principles and build them into notification systems deliberately.

Animal training offers a clearer look at how schedules are used intentionally and sequentially. The standard approach for teaching a dog a new behavior is to start with continuous reinforcement, rewarding every correct response so the animal quickly learns what’s expected. Once the behavior is established, trainers gradually shift to an intermittent schedule. This transition needs to be gradual and purposeful, not abrupt, so the animal builds confidence that rewards are still coming even if they don’t arrive every time. Once the dog is comfortable with intermittent rewards, the trainer can then start being selective about which responses earn a treat, reinforcing only the best versions of the behavior.

Workplaces use these schedules too, often without naming them. Piece-rate pay is a fixed ratio schedule. Salary with annual reviews resembles a fixed interval schedule, complete with the scalloping effect of increased effort as review time approaches. Commission-based sales, where success depends on an unpredictable number of calls or pitches, mirrors a variable ratio schedule and tends to produce persistent, high-effort behavior.

Why the Pattern Matters More Than the Reward

The most counterintuitive lesson from decades of reinforcement schedule research is that rewarding a behavior every single time is not the best way to make it last. Continuous reinforcement builds behavior quickly but creates fragile habits. The moment rewards stop, so does the behavior. Intermittent schedules build slower but create behaviors that are remarkably durable.

This has practical consequences in parenting, education, therapy, and self-management. If you want a behavior to persist long-term, the goal isn’t to reward it forever. It’s to establish it with consistent rewards, then gradually thin the schedule so the behavior sustains itself through increasingly sparse reinforcement. The behavior becomes part of the routine rather than something performed only when a reward is guaranteed.