Counterbalancing is a technique in experimental psychology where researchers vary the order in which participants experience different conditions, so that the sequence itself doesn’t skew the results. It’s used almost exclusively in within-subjects designs, where every participant goes through all conditions in a study rather than just one. Without counterbalancing, any difference researchers observe between conditions could be caused by the order participants experienced them in, not by the conditions themselves.
Why Order Matters in Experiments
Imagine a study testing whether background music helps or hurts concentration. Participants complete a problem-solving task in silence, then complete another one with music playing. If everyone does silence first and music second, the results are muddled. Maybe participants scored higher on the second task because they’d already warmed up to the format. Maybe they scored lower because they were mentally tired. These are called order effects, and they make it impossible to tell whether the music actually did anything.
Order effects come in a few varieties. Practice effects happen when participants improve simply because they’ve done the task before. Fatigue effects happen when performance drops because participants are worn out or bored. Carryover effects are subtler: the experience of one condition lingers and changes how a participant responds to the next one. A relaxation exercise done first, for example, might leave someone calmer during a stressful condition that follows, producing results that wouldn’t occur if the stressful condition came first.
Counterbalancing tackles all of these by ensuring that each condition appears equally often in each position. Half the participants might do silence first and music second, while the other half do music first and silence second. Any advantage from going first or second gets spread evenly across both conditions, so it cancels out in the final analysis.
Complete Counterbalancing
The most thorough approach is complete counterbalancing, where every possible ordering of conditions is used. For two conditions (A and B), there are only two possible orders: AB and BA. Easy enough. But the number of required orders grows fast. It follows the factorial formula: for any number of conditions n, you need n! (n factorial) different sequences. Three conditions require 6 orders. Four conditions require 24. Five conditions require 120, and six require 720.
This is why complete counterbalancing is practical only when a study has a small number of conditions. With four conditions you’d need at least 24 participants (one per order) or some multiple of 24 to keep things balanced. Beyond four or five conditions, recruiting enough participants for every possible sequence becomes unrealistic. Researchers turn to partial methods instead.
Latin Square Design
A Latin square is the most common shortcut when complete counterbalancing isn’t feasible. It’s a grid where the number of rows and columns matches the number of conditions, and each condition appears exactly once in each row and once in each column. For four conditions, you’d need only four different orderings instead of 24, with each condition appearing in each position (first, second, third, fourth) exactly once.
The tradeoff is that a Latin square doesn’t cover every possible sequence. It’s an incomplete design. Not all transitions between conditions are represented, so it controls for position effects (which slot a condition falls in) but doesn’t fully account for every possible carryover from one specific condition to another. Still, it’s a practical solution that dramatically reduces the number of orderings needed while keeping the most important source of bias in check.
Reverse (ABBA) Counterbalancing
When a study has only two conditions, a simpler technique called reverse or ABBA counterbalancing is sometimes used. Each participant receives both conditions in one order and then again in the reverse order: A, then B, then B, then A. The idea is that progressive effects like fatigue or practice will hit both conditions equally because each one appears in both early and late positions.
This method has clear limits. It requires participants to go through every condition twice, which doubles the length of the experiment. It also doesn’t work well with more than two conditions, because the number of repetitions becomes unwieldy. And if the effect being studied changes at a non-linear rate over time (say, fatigue accelerates sharply toward the end), the symmetry of ABBA may not actually cancel things out.
Randomized Counterbalancing
Rather than systematically assigning every possible order, researchers sometimes randomly assign each participant to a sequence. In a two-condition study, each participant is randomly placed in either the AB or BA group. For studies with more conditions, random assignment picks from the pool of possible orderings without requiring that every ordering be used.
Randomized counterbalancing relies on probability. With a large enough sample, the random assignments will roughly balance out which condition comes first, second, and so on. It won’t achieve perfect balance the way systematic methods do, but it’s simpler to implement and scales more easily to studies with many conditions. It’s especially useful when the number of possible orderings far exceeds the number of available participants.
Why It’s Only Needed in Within-Subjects Designs
Counterbalancing is a concern specific to within-subjects (repeated-measures) designs, where every participant experiences all conditions. In a between-subjects design, each participant is assigned to only one condition, so there’s no sequence of conditions to worry about. A participant in the “music” group never does the “silence” condition, which means there’s no opportunity for practice, fatigue, or carryover from one condition to affect performance in another.
Between-subjects designs avoid order effects entirely, but they come with their own cost: you need more participants, and individual differences between groups can introduce noise. Within-subjects designs are statistically more powerful because each person serves as their own comparison point, eliminating person-to-person variability. Counterbalancing is the price you pay for that power. It lets researchers keep the advantages of within-subjects designs while neutralizing the order-related confounds that come with them.
When Counterbalancing Fails
Counterbalancing assumes that order effects are symmetric, meaning the carryover from condition A to condition B is roughly the same size as the carryover from B to A. When that assumption holds, the effects cancel out across groups. But this symmetry can’t always be verified ahead of time.
The problem case is called a differential carryover effect. This happens when one particular sequence (say, treatment followed by control) produces a unique reaction that doesn’t occur in the reverse sequence. For example, if a drug tested in condition A has lingering physiological effects that alter responses in condition B, but condition B has no such lingering effect on condition A, the carryover is asymmetric. Counterbalancing won’t fix this because the two directions don’t cancel each other out. Statistically, this shows up as a significant interaction between the order participants were assigned and the treatment itself.
When differential carryover effects are suspected, researchers may need to abandon the within-subjects approach altogether and switch to a between-subjects design, where each participant only experiences one condition and carryover becomes impossible. Alternatively, they can build in longer gaps between conditions (called washout periods) to let the effects of one condition fade before the next one begins.

