What Is a 1-Sample t-Test? Definition and How It Works

A one-sample t-test is a statistical method that compares the average of a single group of measurements to a specific known value. It answers a straightforward question: is the average I measured from my sample meaningfully different from a number I expected, or could the difference just be due to chance? It’s one of the most common statistical tests in science, medicine, and social research.

How It Works

Say a medical reference states that the average sodium level in adult blood is 140 mEq/L. You collect blood samples from 50 adults and measure their sodium levels. The average in your sample comes out to 138. Is that difference real, or just random variation from grabbing 50 people instead of testing every adult on earth? A one-sample t-test gives you a formal way to answer that.

The test compares three things: how far your sample average is from the expected value, how spread out your individual measurements are, and how many measurements you collected. A small gap between your sample average and the expected value is less convincing on its own, but if your data points are tightly clustered and your sample is large, even a small gap can be statistically meaningful.

The formula captures this logic in one calculation. You take the difference between your sample average and the expected value, then divide by your sample’s standard deviation divided by the square root of your sample size. The result is called a t-value. A larger t-value (positive or negative) means a bigger gap between what you measured and what you expected, relative to the variability in your data.

What the Results Tell You

After calculating the t-value, you look up (or let software calculate) a p-value. The p-value represents the probability of seeing a difference at least as large as yours if the true average really does equal the expected value. In other words, it tells you how surprising your result would be if there were no real difference.

Most fields use a threshold of 0.05. If your p-value falls below 0.05, you reject the assumption that the true average equals the expected value. If your p-value is above 0.05, you don’t have strong enough evidence to say the true average is different. In the sodium example, a p-value of 0.03 would suggest adult sodium levels genuinely differ from 140 mEq/L, while a p-value of 0.15 would mean the difference you observed could easily be random noise.

A common mistake is treating a non-significant result as proof that the average equals the expected value. It doesn’t prove that. It only means your data wasn’t strong enough to detect a difference, which could happen because the sample was too small or the measurements were too variable.

Statistical Significance vs. Practical Significance

A p-value only tells you whether a difference is likely real. It says nothing about whether the difference matters. With a large enough sample, even a trivially small difference will produce a significant p-value. That’s where effect size comes in.

The most common effect size measure for a t-test is Cohen’s d, which expresses the difference in terms of standard deviations. A d of 0.2 is generally considered a small effect, 0.5 is medium, and 0.8 or above is large. If you find that average blood pressure in your sample is statistically different from the reference value but Cohen’s d is 0.1, the difference is real but probably not clinically important.

When to Use It vs. Other Tests

The one-sample t-test fits a specific scenario: you have one group of measurements and one comparison value. If you’re comparing two groups against each other (say, a treatment group and a control group), you need a two-sample t-test instead. The one-sample version is strictly for testing whether your group’s average matches a predetermined number.

You might also wonder about the z-test, which does something similar. The difference is practical. A z-test requires you to already know the true standard deviation of the entire population, which almost never happens in real research. When you’re estimating variability from your sample data (which is the normal situation), the t-test accounts for that extra uncertainty. With sample sizes above roughly 30, the t-test and z-test give nearly identical results anyway. Below 30, the t-test is more conservative and more appropriate.

Assumptions Your Data Needs to Meet

The one-sample t-test requires several conditions to produce reliable results:

  • Continuous data. Your measurements need to be on a numeric scale where differences are meaningful, like weight, temperature, or test scores. Categories (like “yes/no” or “mild/moderate/severe”) won’t work.
  • Independence. Each measurement should be unrelated to the others. If you measure the same person twice, those two values aren’t independent, and you’d need a different test (a paired t-test).
  • Approximate normality. The data should follow a roughly bell-shaped distribution. With larger samples (above 30 or so), this matters less because the math self-corrects. With small samples, skewed data or extreme outliers can distort results.
  • Random sampling. Your data should represent the population you’re drawing conclusions about. A convenience sample (testing only people who happened to walk into your clinic) can introduce bias no statistical test can fix.

If your data is clearly non-normal and your sample is small, the standard alternative is the Wilcoxon signed-rank test. It works on the ranks of your data rather than the raw values, making it resistant to skewed distributions and outliers. The tradeoff is that it’s slightly less powerful when the normality assumption actually holds.

A Concrete Example

Suppose you want to know whether men with high blood pressure have a different average BMI than the general population value of 25. You measure BMI in 40 men who have been diagnosed with hypertension and get a sample average of 27.3 with a standard deviation of 4.5.

You plug those numbers into the formula: (27.3 minus 25) divided by (4.5 divided by the square root of 40). That gives you a t-value. The degrees of freedom for a one-sample t-test are simply your sample size minus one, so 39 in this case. Using those two numbers, software produces a p-value. If it comes back at, say, 0.01, you’d conclude that hypertensive men in this population have a meaningfully higher average BMI than 25. The effect size would help you judge whether the 2.3-point difference is large enough to be clinically relevant.

This same logic applies across fields. A factory might test whether their product’s average weight matches the specification. An educator might check whether their students’ average test score differs from a national benchmark. The mechanics are identical every time: one group, one number, one question about whether the gap is real.