You use a paired t-test when you’re comparing two measurements that come from the same subjects or from subjects deliberately matched together. The classic scenario is a before-and-after design: you measure something, apply an intervention, then measure again. Because both measurements come from the same person, the data points are linked, and a paired t-test accounts for that link. If your two groups of measurements are independent of each other, you need a different test entirely.
What Makes Data “Paired”
The core question is whether each observation in one group has a specific partner in the other group. This pairing happens in three common situations:
- Repeated measures on the same subject. You weigh participants before a 12-week exercise program and again after it. Each person generates two data points that are naturally coupled.
- Matched subjects. You deliberately pair individuals based on characteristics like age, sex, or disease severity, then assign one from each pair to a treatment and the other to a control. A study on calciphylaxis in dialysis patients, for instance, matched each case with two controls based on age and hemodialysis duration from the same dialysis center.
- Naturally related pairs. You compare blood pressure in twins, cognitive scores between spouses, or grip strength between a person’s left and right hand. The biological or social relationship creates the dependency.
If the selection of one participant has no influence on who ends up in the other group, the samples are independent, and you’d use an independent (two-sample) t-test instead. Getting this distinction wrong is one of the most common mistakes in basic statistics, and it changes your results because a paired test works with the differences within each pair rather than comparing two separate pools of data.
Why Pairing Changes the Analysis
A paired t-test doesn’t compare two raw sets of scores. It calculates the difference between each pair first, then tests whether the average of those differences is meaningfully different from zero. This matters because it removes person-to-person variability from the equation.
Consider measuring blood pressure before and after a medication. People start at wildly different baselines. An independent t-test would treat all that baseline variation as noise, making it harder to detect a real drug effect. The paired t-test sidesteps the problem by focusing only on how much each person changed. If most people dropped by a similar amount, the test picks that up even when their starting points varied enormously. This is why paired designs are often more powerful: they need fewer participants to detect the same size effect.
Common Real-World Examples
In clinical and health research, paired t-tests show up constantly in pre-test/post-test designs. A study might put participants through 30 minutes of high-intensity training five days a week for three months, weighing them before and after the intervention. Another might test older adults’ memory before and after a month of using a brain-training app. In both cases, the same people are measured twice, so the data is paired.
Outside of before-and-after designs, paired t-tests apply whenever you’ve deliberately matched participants. Case-control studies in epidemiology frequently match on age (sometimes within a two-year window), sex, geographic area, and calendar date. A population-based study examining cardiovascular risk after a spouse’s ICU admission matched exposed spouses to four unexposed individuals each, pairing on age, sex, and insurance status. When matching is part of the design, ignoring it during analysis wastes the precision you built into the study.
Assumptions You Need to Check
The paired t-test has a few requirements. Your data should be measured on a continuous scale (things like weight, blood pressure, or test scores, not categories). The pairs should be drawn randomly or at least represent the population you care about. And the differences between pairs should follow an approximately normal distribution.
Notice that last point carefully: it’s the differences that need to be roughly normal, not the raw scores themselves. You can check this with a normality test like the Shapiro-Wilk test. A p-value above 0.05 on that test suggests the differences are normally distributed enough to proceed. You can also simply plot the differences in a histogram or Q-Q plot and look for obvious skew or outliers.
With larger samples (generally 30 or more pairs), the paired t-test becomes robust to moderate departures from normality, thanks to the central limit theorem. With small samples and clearly non-normal differences, you’ll want an alternative.
When to Use the Wilcoxon Signed-Rank Test Instead
The Wilcoxon signed-rank test is the nonparametric cousin of the paired t-test. It handles the same paired design but doesn’t assume the differences are normally distributed. Use it when your differences are heavily skewed, when you have extreme outliers that would distort a mean, or when your data is ordinal (like pain rated on a 1-to-10 scale where the intervals aren’t truly equal).
When the normality assumption holds, the paired t-test is slightly more powerful, meaning it’s better at detecting real effects. But when that assumption breaks down, the Wilcoxon test can actually outperform the t-test. So the choice isn’t about one being universally better. It’s about matching the test to your data’s actual shape.
Planning Your Sample Size
Before collecting data, you should estimate how many pairs you need. This is called a power analysis, and it requires four inputs: the expected effect size, your significance level (typically 0.05), your desired power (typically 0.80, meaning an 80% chance of detecting a real effect), and whether you’re testing in one direction or two.
Effect size for a paired t-test is expressed as the mean difference divided by the standard deviation of the differences. Standard benchmarks classify 0.2 as a small effect, 0.5 as medium, and 0.8 as large. If you expect a medium effect and use standard settings (alpha of 0.05, power of 0.80, two-tailed test), free software like G*Power will calculate the minimum number of pairs you need. Running this analysis before your study prevents the common problem of collecting too little data to detect anything meaningful.
Reporting Your Results Clearly
When you report a paired t-test, include the t-statistic, degrees of freedom (which equals the number of pairs minus one), and the exact p-value. Report p-values as continuous numbers rather than simply stating “p < 0.05” or labeling results as “significant” or “not significant.” A p-value of 0.037 is more informative than “p < 0.05,” and a p-value of 0.072 deserves to be reported as 0.072 rather than dismissed behind “p > 0.05.” The one exception: when p is very small, reporting p < 0.001 is fine.
Always pair your p-value with an effect size. The p-value tells you how surprising the result would be if there were no real difference, but it says nothing about how large the difference is. For paired designs, the standard effect size divides the mean difference by the standard deviation of the differences. You can also calculate it directly from your t-value by dividing it by the square root of your sample size. A confidence interval around the mean difference adds further context, giving readers a range of plausible values for the true effect.
Quick Decision Guide
Use a paired t-test when all of the following are true:
- Two measurements per unit. Each subject, patient, or matched pair gives you exactly two data points to compare.
- Continuous outcome. You’re measuring something on a scale with meaningful numerical distances, not categories or ranks.
- Roughly normal differences. The difference scores don’t show extreme skew or heavy outliers, especially with small samples.
If your two groups are independent with no pairing or matching, use an independent samples t-test. If you have paired data but the differences aren’t close to normal, switch to the Wilcoxon signed-rank test. If you’re comparing more than two time points on the same subjects (say, measurements at baseline, 3 months, and 6 months), you’ve outgrown the paired t-test and need a repeated-measures ANOVA or similar approach.

