Who Investigated Positive and Negative Reinforcement?

B.F. Skinner is the psychologist most directly responsible for investigating positive and negative reinforcement as mechanisms of learned behavior. He coined the term “operant conditioning” in 1937 and formally distinguished between positive and negative reinforcement in his 1938 book, The Behavior of Organisms. But Skinner built on earlier work by Edward Thorndike, whose puzzle box experiments in the late 1800s first demonstrated that consequences shape behavior.

Thorndike’s Law of Effect: The Starting Point

Before anyone used the words “positive reinforcement” or “negative reinforcement,” Edward Thorndike was watching cats try to escape from homemade wooden boxes. In the 1890s, he placed animals inside enclosures that required them to operate a latch to get out and reach a piece of food. Each time the animal was placed back in the box counted as a new training trial, and the key finding was straightforward: animals escaped faster with each successive attempt.

In his 1898 monograph, Thorndike described what was happening as “the connection of a certain act with a certain situation and resultant pleasure.” But it wasn’t until his 1911 book, Animal Intelligence, that he laid out his formal principle, known as the Law of Effect. The idea: responses followed by satisfaction become more firmly connected to a situation, making them more likely to happen again. Responses followed by discomfort have their connections weakened. The greater the satisfaction or discomfort, the stronger or weaker the bond.

This was a critical foundation. Thorndike framed learning as the strengthening of a link between a stimulus and a response, not the strengthening of the response itself. That distinction matters because Skinner would later reframe the entire concept around what happens after the behavior, not the stimulus that triggers it.

Skinner Defines the Terms

B.F. Skinner took Thorndike’s general principle and turned it into a precise experimental framework. In 1937, he introduced the term “operant conditioning” to distinguish behavior that acts on the environment from the reflexive responses studied by Pavlov. Where Pavlov’s dogs salivated in response to a bell (a stimulus-driven reflex), Skinner was interested in voluntary actions that organisms perform because of what follows.

His 1938 book spelled out the rules. He described two types of conditioning. Type S was Pavlovian, where a new stimulus gets paired with an existing reflex. Type R was operant: if a behavior is followed by a reinforcing stimulus, the behavior gets stronger. If that reinforcing stimulus stops appearing, the behavior weakens and eventually fades, a process Skinner called extinction.

Crucially, Skinner wrote that “there are thus two kinds of reinforcing stimuli, positive and negative. The cessation of a positive reinforcement acts as a negative, the cessation of a negative as a positive.” This single sentence formalized the distinction that students still learn today. Positive reinforcement means adding something desirable after a behavior. Negative reinforcement means removing something unpleasant. Both increase the likelihood the behavior will happen again.

The Skinner Box and How It Worked

Skinner’s primary tool was an automated chamber, now universally called a Skinner box. A hungry rat placed inside the chamber would eventually press a lever, which dispensed a food pellet. Because the rat found the food rewarding, it pressed the lever more often. This is positive reinforcement in its simplest form: add food, get more lever pressing.

Negative reinforcement looked different. In a typical setup, a mild electric current ran through the floor of the chamber. Pressing the lever turned off the shock. The rat learned to press the lever not to gain something pleasant but to escape something unpleasant. The removal of the aversive stimulus strengthened the behavior, just as the addition of food did in the positive version.

What made Skinner’s approach genuinely new was automation and precision. The apparatus recorded every response mechanically, allowing Skinner to study exact patterns of behavior over long periods without human intervention. This led to one of his most influential contributions: the study of reinforcement schedules.

Reinforcement Schedules and Their Effects

Skinner and his colleagues discovered that when and how often you deliver reinforcement matters enormously. A behavior reinforced every single time it occurs (continuous reinforcement) is learned quickly but also fades quickly once reinforcement stops. Intermittent reinforcement, where the reward comes only some of the time, produces behavior that is far more resistant to extinction.

Four basic schedules emerged from this work. Fixed-ratio schedules reinforce after a set number of responses, like paying a factory worker per unit produced. Variable-ratio schedules reinforce after an unpredictable number of responses, which is the principle behind slot machines and explains why they are so compelling. Fixed-interval schedules reinforce the first response after a set amount of time has passed, which tends to produce a characteristic pattern: a pause after each reward followed by an accelerating burst of activity as the next reward time approaches. Variable-interval schedules reinforce after unpredictable time periods, producing a slow but steady rate of responding.

Research showed that prior experience with one type of schedule could influence how an animal responded to a new one. Animals with a history of high-rate ratio schedules, for instance, tended to maintain higher overall response rates even when switched to interval-based schedules. These findings revealed that reinforcement history leaves a lasting imprint on behavior, not just the current contingencies.

Escape vs. Avoidance: Two Faces of Negative Reinforcement

Negative reinforcement operates through two distinct pathways. Escape conditioning involves performing a behavior to stop something unpleasant that is already happening, like pressing a lever to turn off a shock that’s currently running. Avoidance conditioning involves performing a behavior to prevent something unpleasant from starting at all, like pressing the lever when a warning light appears, before any shock is delivered.

The key difference is the proximity of the threat. In escape, the aversive event is ongoing and you act to end it. In avoidance, the aversive event is predicted but hasn’t arrived yet, and you act to prevent it. Both qualify as negative reinforcement because both involve the removal or prevention of something unpleasant, and both make the behavior more likely to occur in the future. Avoidance learning is particularly relevant to understanding anxiety, where people often develop elaborate behavioral patterns to sidestep situations they associate with discomfort.

Why Negative Reinforcement Gets Confused With Punishment

The word “negative” trips people up. Even within behavioral science, the confusion is well documented. Murray Sidman, a prominent researcher in the field, noted that readers introduced to the concept are particularly prone to equating negative reinforcement with punishment, and he openly called for a replacement term that would eliminate the confusion.

The distinction is actually simple once you focus on the outcome. Both positive and negative reinforcement increase behavior. Punishment decreases it. Positive reinforcement adds something good. Negative reinforcement takes away something bad. Punishment either adds something bad (like a penalty) or removes something good (like taking away a privilege). The confusion arises because “negative” sounds like it should mean “bad,” but in this context it just means subtraction.

There are also genuine gray areas. In a classic experiment, rats in a cold chamber learned to press a lever that turned on a heat lamp. Was this positive reinforcement (adding warmth) or negative reinforcement (removing cold)? The behavioral outcome is identical regardless of how you categorize it, which is why some researchers have questioned whether the positive/negative distinction is always clean. In practice, reinforcement often involves a shift from one condition to another rather than the simple addition or removal of a single stimulus.

Primary and Secondary Reinforcers

Not all reinforcers work the same way at a biological level. Primary reinforcers satisfy basic survival needs: food, water, sleep, warmth, sex. They require no learning. A hungry animal does not need to be taught that food is desirable. These reinforcers are processed in evolutionarily older brain regions.

Secondary reinforcers, also called conditioned reinforcers, only acquire their power through association with primary reinforcers. Money is the most obvious human example. A dollar bill has no inherent biological value, but because it has been reliably paired with things that do (food, shelter, comfort), it functions as a powerful reinforcer. These learned reinforcers are processed in newer areas of the brain, particularly regions involved in abstract evaluation and decision-making.

Dopamine and the Brain’s Reinforcement System

Neuroscience has since identified the biological machinery behind what Skinner described behaviorally. Dopamine, a chemical messenger in the brain, plays a central role. When an outcome is better than expected, dopamine neurons fire in rapid bursts, creating what researchers call a positive prediction error. This signal strengthens the connection between the action and its outcome, making the action more likely in the future. When an outcome is worse than expected, dopamine neuron firing drops below its baseline rate, generating a negative prediction error that weakens the behavior.

These dopamine signals are most active in a brain region called the striatum, which acts as a hub for learning which actions lead to rewards. Brain imaging studies in humans and direct recordings in animals both confirm that this region consistently tracks prediction errors during reinforcement learning. Medications that increase dopamine activity tend to enhance learning from positive outcomes, while the effects of reducing dopamine are more complex, depending on dosage and which specific receptors are affected.

Modern Applications in Therapy

Skinner’s framework for reinforcement remains the backbone of applied behavior analysis (ABA), a therapeutic approach used widely for autism spectrum disorder, developmental disabilities, and behavioral challenges. In clinical settings, positive reinforcement involves adding something the individual finds motivating immediately after a target behavior. What counts as reinforcing is entirely individual: what works for one person may have no effect on another.

Negative reinforcement is used intentionally and carefully in these settings, particularly when escape or avoidance is already driving a person’s behavior. For example, completing a task might result in the removal of an otherwise extended review period, increasing the likelihood of task completion in the future. Timing is critical in both forms. Reinforcement delivered immediately after the behavior is far more effective than reinforcement that comes with a delay, because the brain needs to connect the action with its consequence in real time.