A baseline in psychology is a starting measurement taken before any intervention, treatment, or experimental change occurs. It captures how a person behaves, thinks, feels, or functions under normal conditions, giving researchers and clinicians a reference point to judge whether something actually made a difference. Without knowing where someone started, there’s no reliable way to determine whether a therapy worked, a symptom improved, or a brain injury healed.
How Baselines Work in Research
The core logic is straightforward: measure something first, introduce a change, then measure again. The difference between those two measurements tells you whether the change had an effect. In behavioral research, this typically means observing and recording a target behavior (how often a child raises their hand in class, how many minutes someone spends on task) multiple times before any intervention begins. By convention, a minimum of three baseline data points are needed to establish stability, though the What Works Clearinghouse standards recommend at least five data points per phase for a study to meet their quality criteria.
Stability matters because the baseline needs to show a consistent pattern. If a person’s behavior is wildly fluctuating or trending sharply upward or downward before the intervention even starts, it becomes nearly impossible to tell whether later changes were caused by the intervention or were just part of an existing pattern. Researchers look at several features when evaluating baseline data: the overall level (how high or low the measurements are), the trend (whether they’re climbing or falling), the variability (how much they bounce around), and how consistently these features hold across repeated observations.
Baselines in Therapy and Clinical Work
Clinicians use baselines the same way researchers do, just in a more practical context. A therapist treating someone for anxiety might ask them to track their worry episodes, sleep quality, or avoidance behaviors for a week or two before starting treatment. Those initial numbers become the comparison point for everything that follows. If a client reported panic attacks four times per week at baseline and now reports one per week after eight sessions of therapy, that’s a concrete, measurable improvement rather than a vague sense that things are “better.”
In applied behavior analysis, which is commonly used with individuals on the autism spectrum, the baseline phase is a formal part of the treatment design. Practitioners collect repeated measurements of the target behavior under normal conditions, then introduce the intervention only after the baseline is stable enough to serve as a reliable comparison. If stability isn’t achieved in the initial sessions, additional measurements are taken until a clear pattern emerges. This discipline around the baseline phase is what allows practitioners to make confident claims about whether their specific intervention caused a change in behavior.
Concussion Testing and Cognitive Baselines
One of the most well-known real-world applications of baselines is in concussion management for athletes. Before the season starts, athletes complete a battery of cognitive tests measuring things like memory, reaction time, and processing speed. These pre-injury scores become their personal baseline. If they later sustain a concussion, clinicians compare post-injury test results against that stored baseline to gauge the severity of impairment and track recovery over time.
This personalized approach matters more than it might seem. Research published in the Journal of Athletic Training found that preexisting factors like learning disabilities, prior concussions, or attention disorders can significantly influence baseline cognitive scores. When those individual baselines aren’t available and clinicians rely instead on population-average norms, they risk making incorrect return-to-play decisions. An athlete whose “normal” cognitive performance is below average might appear recovered when compared to general norms, even though they haven’t actually returned to their own personal normal. Symptom reporting alongside neurocognitive data makes the baseline picture more complete, since concussion symptoms are routinely used to track recovery and inform decisions about when it’s safe to return to activity.
Physiological Baselines
Psychology doesn’t only measure behavior and cognition. Physiological baselines capture what’s happening in the body under resting or normal conditions, giving researchers a reference point for studying how people respond to stress, emotions, or environmental changes. Common physiological markers include heart rate, skin conductance (how much your palms sweat), and cortisol, a hormone your body releases in response to stress.
Cortisol is particularly useful because it can be measured through saliva, blood, urine, or even hair samples. Hair cortisol is especially valuable for establishing long-term baselines, since it reflects stress hormone levels accumulated over weeks or months rather than a single moment. Studies have validated hair cortisol concentration as a reliable biomarker for chronic stress, making it possible to distinguish between someone’s typical hormonal state and an acute stress response. Research on burnout patients, for instance, has shown elevated heart rate and cortisol levels during the first hour after waking compared to healthy individuals, a finding that only becomes meaningful when you have a healthy baseline for comparison.
Why Baseline Comparisons Can Go Wrong
Baselines seem simple in concept, but they introduce real methodological pitfalls when used carelessly. One common mistake in clinical trials is comparing each group’s outcome against its own baseline separately, rather than comparing the groups directly against each other. This sounds reasonable but is statistically invalid. The actual error rate of this approach can climb as high as 50% for two groups and 75% for three, meaning there’s a coin-flip chance (or worse) of concluding a treatment works when it doesn’t.
The problem comes down to two forces that can mimic real improvement. First, natural changes over time: people often get better on their own, and a before-and-after comparison within one group can’t distinguish genuine treatment effects from spontaneous recovery. Second, regression toward the mean: people who score unusually high or low at baseline tend to score closer to average the next time, purely by chance. A group selected because they had severe symptoms at baseline will almost always look improved at follow-up, treatment or not. The correct approach in a randomized trial is to compare groups directly, adjusting for baseline differences using statistical methods, rather than running separate before-and-after tests within each group.
Baseline Drift in Long-Term Studies
In studies that track people over months or years, the baseline itself can shift in ways that complicate interpretation. This is sometimes called baseline drift. A person’s “normal” cognitive performance, emotional regulation, or stress response at age 10 isn’t the same as at age 14, simply because of development. Researchers studying early adolescence have found a common pattern: individuals who score low at baseline tend to show larger gains at follow-up, while those who start high tend to show smaller gains or even declines. This negative correlation between starting point and change isn’t necessarily a treatment effect. It may simply reflect the fact that people with lower initial scores have more room to improve.
To handle this, researchers use statistical models that separate genuine change from artifacts of the baseline measurement. Latent change score models, for example, estimate how much of the observed change is predicted by where someone started versus how much represents true developmental or treatment-related shifts. These techniques are especially important in developmental psychology, where the baseline is a moving target rather than a fixed reference point.

