What Is a Dependent T-Test and How Does It Work?

A dependent t-test is a statistical test that compares two sets of measurements taken from the same group of people (or from matched pairs) to determine whether the difference between them is meaningful or just due to chance. You’ll also see it called a paired samples t-test, a paired t-test, or a repeated measures t-test. It’s one of the most common tools in research for answering a simple question: did something actually change?

When You Use a Dependent T-Test

The key feature of this test is that the two sets of scores are linked. Every data point in one group has a specific partner in the other group. The most common scenario is a before-and-after design: you measure the same people twice, once before some intervention and once after. For example, a dental hygiene study measured bacterial plaque levels in 32 subjects before and after treatment with different toothbrushes. Each participant’s “before” score pairs directly with their own “after” score, making a dependent t-test the right choice.

The pairing doesn’t always have to come from the same person measured twice. It can also come from matched pairs, where two different people are deliberately paired based on characteristics like age, sex, or baseline health before being assigned to different conditions. What matters is that every observation in one group has a meaningful, specific match in the other.

How It Differs From an Independent T-Test

The independent t-test compares two completely separate groups of people, like a treatment group versus a control group with no overlap. The dependent t-test compares two measurements that come from the same people. This distinction matters because measurements from the same person are correlated. Your blood pressure today is related to your blood pressure tomorrow in a way that your blood pressure and a stranger’s blood pressure are not.

The dependent t-test accounts for this correlation by working with the differences between each pair rather than the raw scores themselves. This approach removes a lot of the variability that comes from individual differences (some people naturally have higher blood pressure, some lower), which makes the test more sensitive at detecting a real change. It also uses a different formula for degrees of freedom: n minus 1 (where n is the number of pairs), rather than the larger degrees of freedom calculation used in an independent test.

How the Test Works

The logic behind the dependent t-test is straightforward. For each pair, you calculate the difference between the two scores. Then you ask: is the average of all those differences far enough from zero that it’s unlikely to have happened by chance?

The test produces a t-value by dividing the mean of those differences by the standard error of the differences. A larger t-value means the observed change is bigger relative to the variability in the data, which makes it more likely that something real is going on. You then compare that t-value to a critical value from the t-distribution, using degrees of freedom equal to the number of pairs minus one.

The result also comes with a p-value, which tells you the probability of seeing a difference this large (or larger) if there were truly no effect. A p-value below 0.05 is the conventional threshold for calling a result statistically significant, meaning you’d reject the idea that the difference is zero. A p-value above 0.05 means you don’t have enough evidence to say the two measurements differ. The threshold of 0.05 is a convention, not a law of nature, and some fields use stricter cutoffs like 0.01.

Assumptions the Data Must Meet

The dependent t-test relies on several conditions to produce valid results:

Paired observations. Each data point in one group must be uniquely matched to a data point in the other group.
Normal distribution of differences. The differences between paired scores (not the raw scores themselves) should be approximately normally distributed. You can check this with a histogram, a Q-Q plot, or a formal test like the Shapiro-Wilk test.
Interval or ratio data. The measurements need to be on a scale where differences between values are consistent and meaningful, like test scores, weights, or times. You can’t use it with ranked or categorical data.
Random sampling. The pairs should be randomly selected from the population you’re trying to generalize to.

The normality assumption becomes more important with smaller samples. With larger samples (generally 30 or more pairs), the Central Limit Theorem provides some flexibility, meaning the test can tolerate moderate departures from normality without producing misleading results.

What to Do When Assumptions Are Violated

If your difference scores are clearly non-normal and your sample is small, the dependent t-test may lose statistical power, meaning it becomes less able to detect a real difference. In that case, the standard alternative is the Wilcoxon signed-rank test. This is a nonparametric test for paired differences that doesn’t require normality because it works with the ranks of the differences rather than their actual values.

The Wilcoxon test isn’t always less powerful than the t-test. When the assumptions of the t-test are seriously violated, the Wilcoxon test can actually outperform it. As a general rule, if your data look reasonably symmetric and bell-shaped, use the t-test. If the differences are heavily skewed or contain extreme outliers, the Wilcoxon signed-rank test is the safer choice.

Effect Size: How Big Is the Difference?

Statistical significance tells you whether a difference exists, but not whether it’s large enough to matter. That’s what effect size measures. The most common effect size for a dependent t-test is Cohen’s d, which expresses the size of the difference in standard deviation units.

One version of this calculation, sometimes written as d_z, divides the mean difference by the standard deviation of the differences. Another version uses the average of the two groups’ standard deviations in the denominator. The choice between them depends on your research context, but both give you a number that’s easy to interpret: around 0.2 is considered a small effect, 0.5 is medium, and 0.8 or above is large. Reporting an effect size alongside your p-value gives a much fuller picture of what actually happened in the data.

How to Report the Results

In academic writing, dependent t-test results follow a standard format. You report the t-value (italicized), the degrees of freedom in parentheses, and the exact p-value. For example: t(31) = 2.45, p = .02. Exact p-values are reported to two or three decimal places, with one exception: if the p-value is less than .001, you simply write p < .001 rather than reporting the exact number.

Means and standard deviations are reported to two decimal places. You don’t need to define common statistical abbreviations like M, SD, t, df, or p, as these are considered universally understood in statistical writing. Including the effect size (Cohen’s d) alongside these numbers is increasingly expected and gives readers the practical significance of your findings, not just the statistical significance.