Normal approximation is a technique in statistics where you use the bell-shaped normal distribution (the familiar symmetric curve) to estimate probabilities that would otherwise require tedious calculations with discrete distributions like the binomial or Poisson. Instead of computing exact probabilities for every possible outcome, you swap in a normal curve with the same mean and spread, then read probabilities from that curve instead. The trade-off is a small loss in precision for a massive gain in simplicity.
Why It Works
The theoretical backbone of normal approximation is the Central Limit Theorem. It states that when you add up or average a large number of independent random outcomes, the resulting distribution increasingly resembles a normal curve, regardless of what the original distribution looked like. As the number of observations grows, the match between the true distribution and the normal curve gets tighter and tighter.
This is why normal approximation isn’t accurate for every random variable. It works best when the quantity you’re looking at is effectively the sum or average of many independent events. Flipping a coin 200 times and counting heads, for instance, is the sum of 200 independent flips, so the normal approximation fits well. Flipping a coin 5 times doesn’t give the theorem enough room to work, and the approximation will be rough.
Approximating the Binomial Distribution
The most common use of normal approximation is with the binomial distribution, which describes the number of successes in a fixed number of independent trials (like how many defective items appear in a batch of 500, or how many patients respond to a treatment out of 300). Computing exact binomial probabilities with large sample sizes involves enormous factorials that are slow or impractical to calculate by hand.
To use the normal approximation, you match the normal curve to the binomial’s center and spread. The mean is calculated as the number of trials (n) multiplied by the probability of success (p), and the standard deviation is the square root of n × p × (1 − p). Once you have those two numbers, you can convert any value to standard units and look up the probability on a standard normal table or calculator.
There’s a standard rule of thumb for when this approximation is reliable: both n × p and n × (1 − p) should be at least 5. If either value falls below 5, the binomial distribution is too lopsided for a symmetric normal curve to represent it well. When both conditions are met, the approximation is generally accurate, and it improves as n increases.
The Continuity Correction
A binomial distribution only produces whole numbers: you can get 14 or 15 successes, but never 14.3. The normal distribution, by contrast, is continuous and assigns probability to every point along the number line. This mismatch creates a gap that can quietly reduce the accuracy of your approximation.
The fix is called a continuity correction, and it works by adding or subtracting 0.5 to bridge the gap between discrete and continuous values. If you want the probability of exactly 6 successes, you calculate the normal probability between 5.5 and 6.5 instead. If you want the probability of more than 6 successes, you use the area above 6.5. If you want 6 or more, you use the area above 5.5. The pattern adjusts depending on the direction of the inequality:
- Exactly x: use the range from x − 0.5 to x + 0.5
- Greater than x: use everything above x + 0.5
- Greater than or equal to x: use everything above x − 0.5
- Less than x: use everything below x − 0.5
- Less than or equal to x: use everything below x + 0.5
Skipping the continuity correction won’t ruin your answer, but including it consistently produces estimates closer to the exact binomial probability, especially at moderate sample sizes.
Approximating the Poisson Distribution
The Poisson distribution models the count of rare events over a fixed period or area, like the number of customer complaints per day or the number of typos on a page. When the average rate of events (lambda) is small, the Poisson distribution is noticeably skewed to the right, and a normal curve is a poor stand-in. Once lambda exceeds about 20, however, the Poisson looks symmetric enough that the normal approximation becomes practical. The approximating normal curve uses lambda as both its mean and the basis for its spread: the standard deviation is the square root of lambda. As the rate increases further, the fit continues to improve.
How the Calculation Works in Practice
Suppose you’re checking a production line where 10% of items have a minor cosmetic flaw, and you inspect 200 items. You want to know the probability that 25 or fewer are flawed. Computing this exactly with the binomial formula would mean summing 26 separate terms, each involving factorials of numbers up to 200.
With normal approximation, you first check the conditions: n × p = 200 × 0.10 = 20, and n × (1 − p) = 200 × 0.90 = 180. Both exceed 5, so you’re in the clear. The mean is 20 and the standard deviation is the square root of 200 × 0.10 × 0.90, which is about 4.24. Applying the continuity correction, you look at the probability that a normal variable with mean 20 and standard deviation 4.24 falls below 25.5. Converting to standard units gives (25.5 − 20) / 4.24 ≈ 1.30. A standard normal table tells you the area below 1.30 is about 0.9032, so there’s roughly a 90% chance of finding 25 or fewer flawed items.
Where Normal Approximation Shows Up
Outside of textbooks, normal approximation is embedded in many fields where people work with proportions or counts at scale. Quality control in manufacturing relies on it to set acceptable defect thresholds without computing exact binomial probabilities for every batch size. Polling and survey analysis use it to build confidence intervals around percentages, since the number of people who hold a particular opinion in a sample follows a binomial pattern. In finance, portfolio returns that aggregate many independent small gains and losses tend to follow a shape well described by the normal curve.
Many naturally occurring measurements, like adult heights or standardized test scores, already follow an approximately normal distribution on their own. In those cases you’re not approximating a different distribution so much as recognizing that the normal curve fits the data directly. NBA player heights, SAT scores, and similar large datasets often pass normality checks closely enough that probabilities drawn from the normal curve match observed frequencies well.
When the Approximation Falls Short
Normal approximation struggles in a few predictable situations. When sample sizes are small and the probability of success is far from 0.5, the true distribution is lopsided in a way the symmetric normal curve can’t capture. If p is 0.01 and n is 50, for example, n × p is only 0.5, well below the threshold of 5. The resulting binomial distribution is heavily concentrated near zero with a long right tail, and the normal curve would give misleading probabilities.
Distributions with heavy tails or strong skewness also resist normal approximation. Poker winnings over a short period, for instance, tend to have extreme outliers that a normal curve underestimates. The key question is always whether the data you’re working with looks roughly symmetric and bell-shaped. If it does, the approximation is a powerful shortcut. If the shape is visibly lopsided or has frequent extreme values, exact methods or other approximations will serve you better.

