The number 30 is not a mathematically proven minimum sample size. It’s a rule of thumb that caught on in statistics education because, for many common data distributions, a sample of around 30 produces a sampling distribution that looks roughly normal. This makes certain statistical tests work reasonably well. But the rule oversimplifies a more nuanced reality, and modern statisticians increasingly argue it should be retired.
The Central Limit Theorem Connection
The main justification for the “n ≥ 30” rule comes from the central limit theorem, one of the foundational ideas in statistics. The theorem says that if you take repeated random samples from any population and calculate their averages, those averages will form a bell-shaped (normal) distribution as the sample size grows, regardless of what the original population’s data looks like. This is powerful because many statistical tests assume normality, and the central limit theorem essentially guarantees it if your sample is “large enough.”
The question is: how large is large enough? Textbooks settled on 30 as a convenient answer. For populations that are roughly symmetric and don’t have extreme outliers, 30 observations genuinely do produce a sampling distribution that’s close to normal. But there is little documented evidence that 30 works as a universal threshold. As one widely cited review from UMass Amherst put it, the only thing most statisticians agree on is that symmetric, short-tailed distributions need smaller samples, while skewed or heavy-tailed distributions need larger ones.
Where 30 Actually Comes From
The number has murky origins, but one likely source is William Sealy Gosset, the statistician who developed the t-test under the pen name “Student” in 1908. Gosset initially published tables for very small samples (as few as 2 observations). By 1917, he extended his tables of the t-distribution up to a sample size of 30. This wasn’t because 30 had special mathematical properties. It was a practical cutoff for his printed tables.
Around the same sample size, something visually convenient happens: the t-distribution starts to closely resemble the standard normal distribution. At 30 degrees of freedom, the critical value for a 95% confidence interval is 2.042, compared to the normal distribution’s 1.960. That’s a difference of about 0.08, which is small enough that using the simpler normal distribution doesn’t change your conclusions much. At smaller sample sizes, the t-distribution has fatter tails, meaning you need to account for more uncertainty. So 30 became a rough line where statisticians felt comfortable switching from the t-distribution to the normal distribution in hand calculations. In the era before computers, that simplification mattered.
When 30 Is Not Enough
The rule breaks down most obviously with skewed data, meaning data that clusters toward one end with a long tail stretching the other way. Income data is a classic example: most people earn moderate amounts, but a few earn enormously more, pulling the distribution to the right. A simulation study published in the Journal of Statistical Modeling and Analytics tested exactly how many observations different skewed distributions need before the central limit theorem kicks in. The results showed a nonlinear relationship between skewness and required sample size:
- Mildly skewed data (skewness below 0.5): 20 samples often suffice.
- Moderately skewed data: 30 to 50 may work.
- Heavily skewed data (skewness of 2.5 or higher): sample sizes beyond 100 are needed.
So for some distributions, 30 is actually more than you need. For others, it’s far too few. The blanket rule papers over this variation entirely.
The Statistical Power Problem
Even when 30 observations are enough to make the math of a statistical test valid, they often aren’t enough to detect what you’re looking for. Statistical power is the probability that your test will correctly identify a real effect when one exists. Researchers generally aim for 80% power, meaning an 80% chance of catching a true effect.
With 30 observations per group and a two-sided test at the standard significance level, you only have 80% power to detect an effect size of about 0.74 (measured using a standardized metric called Cohen’s d). That’s a large effect. If the real difference you’re looking for is moderate, say around 0.5, a sample of 30 per group will miss it more often than not. If you want 90% power with a two-sided test, you’d need an effect size of 0.85 or larger for 30 observations to be sufficient. In practical terms, this means a sample of 30 is only reliable for spotting differences that are already pretty obvious.
Why Modern Statisticians Want to Retire the Rule
A well-known paper from Google Research argued directly that it’s time to retire the “n ≥ 30” rule, calling it a relic of the pre-computer era. The paper demonstrated cases where even 1,664 observations weren’t enough for t-test inferences to be reasonably accurate, depending on the shape of the underlying data. If a sample that large can fail, then 30 clearly isn’t a safe universal floor.
The core problem is that the rule treats sample size as a fixed number when it should be a decision based on context. The right sample size depends on how skewed your data is, how large an effect you’re trying to detect, how much uncertainty you can tolerate, and what kind of analysis you’re running. Modern approaches use power analysis (calculating the sample size you need before collecting data) or computer-intensive methods like bootstrapping, which resample your actual data thousands of times to check whether your conclusions hold up. These tools make the old rule unnecessary.
Why the Rule Persists
Despite its limitations, the n ≥ 30 guideline survives because it’s simple, easy to teach, and works reasonably well in introductory statistics courses where examples tend to use well-behaved data. It gives students a concrete number to anchor to when they’re first learning about sampling. For symmetric, unskewed data with no extreme outliers, 30 genuinely is a reasonable starting point. The trouble comes when people carry this classroom heuristic into real-world research, where data is messy, effects are small, and the stakes of getting it wrong are high.
If you’re designing a study or evaluating someone else’s, the honest answer to “how many observations do I need?” is always “it depends.” The number 30 is a pedagogical shortcut, not a statistical law.

