Experimental design in psychology is the structured plan researchers use to test whether one thing causes another. It’s the blueprint for an experiment: how participants are selected, what conditions they experience, what gets measured, and how the researchers rule out alternative explanations. A well-built design is what separates a finding you can trust from one that could mean almost anything.
The Core Logic: Variables and Control
Every psychology experiment revolves around two types of variables. The independent variable is what the researcher deliberately changes or manipulates. The dependent variable is the outcome being measured. If a researcher wants to know whether sleep deprivation affects memory, the amount of sleep participants get is the independent variable and their performance on a memory test is the dependent variable.
The entire point of experimental design is to set things up so that any change in the dependent variable can be confidently attributed to the independent variable, not to something else. “Something else” is called a confound, and most of the techniques in experimental design exist to prevent confounds from creeping in.
Random Assignment vs. Random Selection
These two terms sound similar but do very different jobs. Random selection means drawing participants from a larger population in a way that gives everyone an equal chance of being chosen. This is what lets researchers generalize their findings beyond the people who happened to be in the study.
Random assignment is different. It means placing participants into experimental groups using a chance-based method, like a coin flip or a random number generator. This ensures the groups are similar to each other before the experiment starts, so any differences that emerge afterward are likely caused by the independent variable rather than by pre-existing differences between the groups. Random assignment is what allows researchers to make cause-and-effect claims. Without it, a study can only show that two things are associated, not that one caused the other.
Between-Subjects Design
In a between-subjects design (also called an independent-groups design), each participant experiences only one condition. One group might receive a treatment while another group receives a placebo, and the researcher compares outcomes across the two separate groups.
The main advantage is that it prevents carryover effects. Because participants only go through one condition, there’s no risk of fatigue, boredom, or practice from a first condition bleeding into a second one. Sessions also tend to be shorter since each person only does one thing. The downside is that between-subjects designs need more participants to detect real effects. They also run the risk that the groups differ in ways the researcher didn’t account for. If one group happens to include more anxious people, for example, that could offer an alternative explanation for the results.
Within-Subjects Design
A within-subjects design (sometimes called a repeated-measures design) has every participant go through all conditions. If you’re testing whether people read faster in a serif font or a sans-serif font, each person reads passages in both fonts, and you compare their speeds.
This approach is statistically more powerful because you’re comparing each person against themselves, which eliminates individual differences as a source of noise. You also need fewer participants overall. The tradeoff is that the order in which people experience conditions can distort results. Someone might perform better on the second task simply because they’ve had practice, or worse because they’re tired. Participants may also pick up on what the experiment is testing and adjust their behavior to match what they think the researcher expects, a phenomenon known as a demand effect.
To address order effects, researchers use counterbalancing. This means varying the sequence of conditions across participants so that no single order dominates the data. A simple version has half the participants do Condition A first and Condition B second, while the other half does the reverse. More complex versions, drawn from a branch of mathematics called graph theory, can counterbalance designs with many conditions and multiple observations per participant.
Factorial Design
Factorial designs let researchers study two or more independent variables at the same time. Instead of running separate experiments for each variable, a factorial design crosses them, creating every possible combination of conditions.
Consider a study testing whether a medication and a type of talk therapy each help with depression. A factorial design would create four groups: medication plus therapy, medication plus no therapy, placebo plus therapy, and placebo plus no therapy. The analysis then produces three results. The first is a main effect for the medication (did it help, regardless of therapy?). The second is a main effect for the therapy (did it help, regardless of medication?). The third, and often most interesting, is the interaction effect: did the combination of medication and therapy produce results that weren’t simply the sum of their individual effects? An interaction might reveal, for instance, that medication works modestly on its own but dramatically better when paired with therapy.
Quasi-Experimental Design
Sometimes a true experiment isn’t possible. You can’t randomly assign people to experience a natural disaster, grow up in poverty, or develop a particular personality trait. Quasi-experimental designs handle these situations by comparing groups that already differ in some meaningful way, or by measuring outcomes before and after a naturally occurring event.
A researcher studying the psychological effects of a hurricane, for example, might compare mental health outcomes in an affected community with those in a similar but unaffected one. The design looks like an experiment, but without random assignment, there’s always the possibility that pre-existing differences between the groups are responsible for the results. Quasi-experiments are widely used in education, public health, and social science research precisely because they can operate in real-world settings where controlled experiments would be impractical or unethical.
Internal and External Validity
Internal validity is how confidently you can say the independent variable actually caused the change in the dependent variable. Threats to internal validity include selection bias (groups that weren’t equivalent at the start), attrition bias (certain types of participants dropping out mid-study), detection bias (measurements being influenced by who’s doing the measuring), and performance bias (participants or researchers behaving differently based on knowledge of group assignments).
External validity is how well the findings apply beyond the specific conditions of the study. A study conducted entirely on college students in a university lab may not reflect what happens in the broader population. Studies that exclude people with severe symptoms, co-occurring conditions, or those taking other medications also have limited external validity, because the participants don’t represent the kinds of people who would actually receive the treatment in practice. Short-term studies of conditions that require months or years of treatment face the same limitation.
There’s also ecological validity, a subset of external validity that asks whether the findings hold up in everyday life. Testing how a drug affects reaction time in a quiet, controlled lab with rested, healthy volunteers tells you very little about how that same drug would affect a stressed patient navigating a busy workday.
Ethical Requirements
The American Psychological Association’s ethics code shapes how experiments are designed from the start. Before participating, people must give informed consent. This means they’re told the purpose of the study, how long it will take, any foreseeable risks or discomfort, their right to withdraw at any time without consequences, how their confidentiality will be protected, and who to contact with questions. For studies involving experimental treatments, researchers must also explain the nature of the treatment, what happens if someone is assigned to a control group, and what alternative treatments exist.
When a study involves deception (temporarily misleading participants about the study’s true purpose to prevent biased responses), researchers are required to debrief participants afterward, explaining the real purpose and why deception was necessary. These ethical constraints are not afterthoughts. They directly influence which designs are feasible, which is one reason quasi-experiments exist as an alternative when true experiments would cross ethical lines.
Pilot Studies: Testing the Design First
Before running a full-scale experiment, researchers often conduct a pilot study, a small-scale trial run using the same procedures. The goal isn’t to answer the research question yet. It’s to find out whether the design actually works in practice. Can enough participants be recruited? Are the instructions clear? Do the measurements capture what they’re supposed to? Is the randomization and blinding process functioning as intended?
Pilot studies also generate preliminary data that researchers need to calculate how many participants the full study will require. For outcomes measured on a numeric scale, this means getting estimates of averages and variability. For outcomes measured as success or failure, it means getting a baseline success rate. Running a full experiment without this information risks either wasting resources on too many participants or, more commonly, enrolling too few to detect a real effect.

