What Was the Major Flaw in the Stanford Prison Experiment?

The major flaw in the Stanford Prison Experiment was that the researchers actively coached the guards to be cruel, then presented the resulting abuse as if it had emerged naturally from the situation. What psychologist Philip Zimbardo long described as proof that ordinary people will spontaneously become tyrants when given power was, in reality, a result shaped by direct instructions, researcher involvement, and a deeply compromised study design. Decades of criticism have since revealed not just one flaw but a web of interconnected problems that undermine nearly every conclusion the experiment claimed to support.

The Guards Were Told How to Behave

Zimbardo consistently framed the experiment as a demonstration that the “prison situation” itself caused the guards’ abusive behavior. But recordings and archival documents tell a different story. An 18-minute audio recording, later analyzed by researchers, captures Zimbardo’s undergraduate consultant David Jaffe pressuring a reluctant guard named John Mark to act more aggressively. Jaffe tells Mark directly: “We really want to get you active and involved because the Guards have to know that every Guard is going to be what we call a tough Guard.” Mark clearly did not want to behave this way. He was pushed into it.

Separate video footage shows Zimbardo himself, acting as “prison superintendent,” briefing guards on what was expected of them before the experiment began. A 2019 investigation of the Stanford archives by French researcher Thibault Le Texier confirmed that “the guards received precise instructions regarding the treatment of the prisoners.” This wasn’t a neutral environment where behavior could unfold organically. It was a stage with a director.

Zimbardo Was a Participant, Not an Observer

In any credible experiment, researchers observe from the outside. Zimbardo inserted himself into the scenario as the prison’s “superintendent,” making administrative decisions, responding to events in character, and shaping the dynamics in real time. This dual role meant he had both the motivation to produce dramatic results and the power to influence how participants behaved. The line between experimenter and subject effectively disappeared.

This problem extends to data collection itself. Le Texier’s archival investigation found evidence of “biased and incomplete collection of data,” meaning the records Zimbardo kept were not a neutral account of what happened but a selective one. When the person designing, running, participating in, and recording an experiment is the same individual, the scientific value of the results collapses.

Participants Were Performing, Not Transforming

One of the most damaging revelations came from the guards themselves. Dave Eshelman, the most notoriously abusive guard (nicknamed “John Wayne” by the other participants), later explained that his behavior was entirely deliberate. “What came over me was not an accident,” he said. “It was planned. I set out with a definite plan in mind, to try to force the action, force something to happen so that the researchers would have something to work with.” Eshelman, who had extensive experience in drama productions, said he consciously created a persona before stepping into the role, the same way an actor prepares for a stage performance.

Eshelman described running his own informal experiment within the study, pushing boundaries to see how much the prisoners would tolerate before pushing back. He noted that other guards followed his lead and none told him to stop, but this dynamic was less about the corrupting power of authority and more about a group of young men performing for researchers in an artificial setting. Le Texier’s interviews with 15 of the original participants confirmed that they were “almost never completely immersed by the situation.” They knew it was a study. Many were acting accordingly.

The Recruitment Process Attracted a Biased Sample

The newspaper ad used to recruit participants asked for volunteers for “a psychological study of prison life.” That specific phrasing mattered. A 2007 study replicated the recruitment process, running two versions of the ad: one identical to Zimbardo’s original and one that simply advertised “a psychological study” with no mention of prison. The volunteers who responded to the prison-specific ad scored significantly higher on measures of aggressiveness, authoritarianism, narcissism, and social dominance. They also scored lower on empathy and altruism.

This means the experiment’s participant pool was likely skewed toward people with personality traits associated with abusive behavior before the study even began. The claim that any random group of people would behave the same way under these conditions was already compromised at the recruitment stage.

A Replication Produced Opposite Results

In 2002, psychologists Alex Haslam and Steve Reicher conducted a similar experiment for the BBC, randomly assigning men to guard and prisoner roles in a purpose-built facility over eight days. The results were strikingly different from Zimbardo’s. The guards failed to identify with their role and became reluctant to impose their authority. The prisoners eventually organized and overcame the guards, establishing an egalitarian system among all participants.

That egalitarian arrangement later proved hard to sustain, and some participants did begin pushing toward more authoritarian structures. But the key finding was that placing people in a guard role did not automatically produce cruelty. The situation alone was not enough. This directly contradicted the central lesson Zimbardo had drawn from his experiment for over three decades.

The Guards Were Never Told They Were Subjects

One overlooked detail from the archival investigation is that the guards were not informed they were experimental subjects. They were led to believe the study was about the prisoners. This framing positioned the guards as something closer to research assistants, people helping produce results rather than people whose own behavior was being measured. When you believe you’re part of the research team, you’re far more likely to act in ways you think the researchers want, especially after being told to be “tough.”

This connects to a broader concept in psychology called demand characteristics, where participants pick up on cues about what an experiment is “supposed” to show and adjust their behavior to match. The entire structure of the Stanford Prison Experiment, from the coaching to the role assignments to the concealment of who was actually being studied, maximized these cues rather than controlling for them.

Why It Still Matters

The Stanford Prison Experiment remains one of the most frequently cited studies in introductory psychology courses and popular culture. Its conclusion, that good people turn evil when placed in bad systems, is intuitive and dramatic. But that conclusion rested on a study where the lead researcher participated in the scenario, coached guards toward abuse, recruited a self-selected sample predisposed to aggression, hid from guards that they were subjects, collected data selectively, and then presented the results as spontaneous proof of situational power.

The real lesson may be less about prisons and more about how a compelling narrative can survive for decades even when the evidence behind it is fundamentally compromised. Zimbardo’s study was not a controlled experiment that revealed something true about human nature. It was a demonstration, shaped at every stage by the people running it, that told a story its creator wanted to tell.