A within-subjects design is an experiment where every participant experiences all of the conditions being tested, rather than being assigned to just one. If a study compares three different teaching methods, each participant tries all three, and their performance is compared across conditions. This stands in contrast to a between-subjects design, where separate groups of people each experience only one condition.
You’ll also see this called a repeated measures design, though “repeated measures” sometimes refers more narrowly to a specific set of statistical analyses. The core idea is the same: measure the same people under different circumstances, then look at how their responses change.
How It Works in Practice
In a between-subjects experiment, you might recruit 60 people and randomly split them into three groups of 20, each group getting a different treatment. In a within-subjects experiment, you could recruit 20 people and have all of them go through each treatment at different times. Every person serves as their own comparison point.
A study at Université Laval illustrates this nicely. Researchers wanted to know whether patient-led online learning modules improved medical students’ clinical reasoning. Instead of comparing two separate groups, they had the same 26 students respond to clinical scenarios before and after watching the modules. Passing scores went from 66% before the intervention to 76% after. Because the same students were measured both times, the researchers could attribute the improvement to the training rather than to pre-existing differences between groups.
Why Researchers Prefer It for Small Samples
The biggest advantage is statistical power, which is the ability to detect a real effect when one exists. People differ enormously from each other in ability, motivation, health, and countless other traits. In a between-subjects design, all of that person-to-person variability gets lumped into the analysis, making it harder to spot treatment effects buried under the noise.
A within-subjects design sidesteps this problem. Because you’re comparing each person to themselves, individual differences get mathematically removed from the error term in the analysis. The denominator in the statistical test shrinks, which makes the test more sensitive. With the same number of participants, a within-subjects design will detect effects that a between-subjects design would miss entirely. One simulated comparison found that a standard analysis of variance failed to find significant differences across groups, while a repeated measures version of the same test, applied to the same data, detected them clearly.
This efficiency matters most when participants are expensive or hard to recruit. Clinical trials, studies with rare conditions, and research involving specialized equipment all benefit from needing fewer people. As the statistician Ronald Fisher once noted, a well-designed experiment can improve precision tenfold for the same cost in time and labor. A within-subjects setup is one of the most straightforward ways to achieve that.
The Tradeoff: Order and Carryover Effects
The design introduces its own problems. When someone completes multiple conditions in sequence, what happened in the first condition can bleed into the second. These carryover effects come in several forms.
- Practice effects: Performance improves simply because participants get more familiar with the task. A person solving puzzles under condition B might do better not because condition B is superior, but because they already practiced under condition A.
- Fatigue: The opposite can also happen. Participants get tired, bored, or less motivated as the experiment goes on, so later conditions look worse regardless of their actual effect.
- Sensitization: Experiencing one condition can change how a participant responds to the next. If you’re testing two pain-relief methods on the same person, the experience of the first treatment may alter their expectations or pain perception for the second.
Between-subjects designs avoid all of these issues because each person only encounters one condition. But they require the stronger assumption that the groups are truly equivalent at the start, with no hidden differences between them that could confound the results. Both designs demand tradeoffs.
Controlling for Order Effects
The standard solution is counterbalancing: varying the order in which participants experience conditions so that order effects average out across the group. If half the participants do condition A first and condition B second, while the other half does the reverse, any practice or fatigue effects should roughly cancel.
When there are more than two conditions, full counterbalancing (every possible order) quickly becomes impractical. With four conditions, there are 24 possible orderings. A Latin square design solves this by creating a structured matrix where each condition appears in each position (first, second, third, fourth) exactly once. First proposed by Ronald Fisher in 1925, this technique reduces the number of orderings you need while still ensuring that position effects are distributed evenly.
Counterbalancing has a limitation worth understanding. It controls for order effects only if those effects don’t interact with the treatment itself. If condition A genuinely changes how people respond to condition B in a way that doesn’t happen in reverse, no amount of reordering will fix that. In those cases, a between-subjects design may be the safer choice, even though it requires more participants.
How the Data Get Analyzed
The statistical tools for within-subjects data reflect the design’s structure. For two conditions, a paired t-test compares each person’s score under one condition to their score under the other. You’re essentially analyzing the difference scores, reducing the problem to a one-sample test.
For three or more conditions, a repeated measures ANOVA extends this logic. It accounts for the correlation between measurements taken from the same person at different time points, something a standard ANOVA ignores. That correlation is precisely what gives the design its power: it’s not coincidence that the same person’s scores are related, and the analysis takes advantage of that structure.
When the data aren’t normally distributed, Friedman’s test serves as a nonparametric alternative. For more complex designs with multiple factors or uneven time points, linear mixed-effects models offer the most flexibility. These modern approaches handle missing data and unbalanced designs more gracefully than traditional repeated measures ANOVA.
When to Use Each Design
A within-subjects design works well when the conditions are brief, reversible, and unlikely to permanently change the participant. Taste tests, perceptual experiments, and cognitive tasks where you can measure reaction time under different conditions are classic applications. It’s also a natural fit for tracking change over time, like measuring symptoms before and after treatment.
It works poorly when exposure to one condition makes it impossible to return to baseline. You can’t teach someone a skill and then un-teach it for the control condition. Studies involving surgery, permanent interventions, or knowledge-based outcomes often need a between-subjects approach. The same is true when the risk of participants guessing the study’s hypothesis increases with each condition they experience.
Many real experiments use mixed designs, combining both approaches. One factor might be within-subjects (every participant is measured at multiple time points) while another is between-subjects (participants are assigned to a drug group or a placebo group). This hybrid approach captures the power benefits of repeated measurement while keeping treatment groups separate where carryover would be a problem.

