What Is Chebyshev’s Inequality? Formula & Uses

Chebyshev’s inequality is a rule in probability and statistics that puts a limit on how spread out data can be from the average. Specifically, it tells you that no matter what shape your data follows, at least 1 − 1/k² of all values must fall within k standard deviations of the mean. The power of this result is its universality: it works for any dataset or probability distribution, as long as the mean and variance are finite.

What the Inequality Actually Says

The core idea is simple. Pick any number k greater than 1, and Chebyshev’s inequality guarantees that the probability of a value landing more than k standard deviations away from the mean is at most 1/k². Flip that around, and it means at least 1 − 1/k² of all values sit within that range.

In more concrete terms:

  • Within 2 standard deviations: at least 75% (3/4) of all data points
  • Within 3 standard deviations: at least 89% (8/9) of all data points
  • Within 4 standard deviations: at least 93.75% of all data points

These are minimum guarantees. In practice, most real distributions concentrate data much more tightly than this. But Chebyshev’s inequality gives you a floor you can always count on.

Why It Works for Any Distribution

Most statistical rules you encounter come with strings attached. The famous 68-95-99.7 rule, for instance, only applies to data that follows a normal (bell-shaped) distribution. That rule says 95% of values fall within 2 standard deviations and 99.7% within 3. Those numbers are tighter and more useful, but they’re only valid when your data is normally distributed.

Chebyshev’s inequality makes no assumption about the shape of the distribution. Your data could be skewed, bimodal, uniform, or something bizarre with no name. As long as the data has a finite mean and a finite variance (meaning it doesn’t spread out infinitely), the inequality holds. This makes it especially valuable when you’re working with data and you don’t know, or can’t verify, what distribution it follows.

The tradeoff is that the bounds are conservative. Where the empirical rule promises 95% of normally distributed data within 2 standard deviations, Chebyshev only guarantees 75%. That gap is the price you pay for a result that applies everywhere.

Breaking Down the Formula

The formal statement uses a few standard statistical terms. For a random variable X with mean μ and standard deviation σ, Chebyshev’s inequality states:

P(|X − μ| ≥ kσ) ≤ 1/k²

Here’s what each piece means. The expression |X − μ| is the distance between any particular value and the mean. The term kσ is the threshold you’re checking, expressed as k times the standard deviation. The left side of the inequality is the probability that a value falls at least that far from the mean, and the right side caps that probability at 1/k².

The value of k must be greater than 1 for the result to be useful. At k = 1, the formula gives a bound of 1 (meaning 100% probability), which is trivially true and tells you nothing. At k = 0.5, it would give a bound greater than 1, which is meaningless for a probability. The inequality becomes informative starting just above k = 1, and the bound gets tighter as k grows.

There’s also an equivalent version written without the standard deviation substitution: P(|X − μ| ≥ a) ≤ Var(X)/a², where a is any positive number and Var(X) is the variance. This form is sometimes more convenient when you’re working directly with variance rather than standard deviation.

A Concrete Example

Suppose a factory produces batteries with an average lifespan of 500 hours and a standard deviation of 50 hours. You don’t know the exact distribution of battery lifespans, so the empirical rule is off the table. But Chebyshev’s inequality still works.

If you want to know how many batteries last between 400 and 600 hours (that’s within 2 standard deviations of the mean), the inequality guarantees at least 75%. If you widen the range to 350 to 650 hours (3 standard deviations), at least 89% of batteries fall in that window. These aren’t precise predictions, but they’re reliable lower bounds that hold regardless of how the lifespans are actually distributed.

Where Chebyshev’s Inequality Gets Used

The inequality shows up across many fields, often behind the scenes. In finance, it provides worst-case bounds on how far an asset’s return might deviate from its expected value, which is useful when return distributions are heavy-tailed and far from normal. In quality control, it helps set tolerance ranges when the exact distribution of a manufacturing process isn’t well characterized.

In theoretical statistics and probability, its role is foundational. Chebyshev’s inequality is a key step in proving the weak law of large numbers, one of the most important results in probability theory. That law says that as you take more and more samples, the average of those samples converges to the true mean. Chebyshev’s inequality provides the mathematical machinery to make that argument rigorous.

Data scientists and engineers reach for it when they need quick, assumption-free bounds. If you can compute a mean and variance, Chebyshev gives you an immediate sense of how concentrated the data is, without fitting a distribution or running simulations.

The One-Sided Version: Cantelli’s Inequality

Standard Chebyshev’s inequality is two-sided: it bounds the probability of being far from the mean in either direction. Sometimes you only care about one direction. For example, you might want to know the probability that a value exceeds the mean by some amount, without worrying about values far below the mean.

Cantelli’s inequality handles this case. For a positive threshold b, it states that P(X ≥ b) ≤ σ²/(b² + σ²). This is tighter than simply halving the two-sided Chebyshev bound, and it’s useful in risk analysis where you’re concerned about exceeding a threshold rather than deviating in both directions.

Limitations Worth Knowing

Chebyshev’s inequality is a worst-case tool. For any specific, known distribution, you can almost always get sharper probability bounds using that distribution’s own properties. The inequality is at its most useful when you know little about the data’s shape, or when you need a guarantee that holds universally.

It also requires finite variance. Certain exotic distributions (like the Cauchy distribution) have undefined variance, and Chebyshev’s inequality simply doesn’t apply to them. For the vast majority of real-world data, though, this isn’t a practical concern.

Finally, the bounds can feel loose. Saying “at least 75% of data is within 2 standard deviations” is underwhelming when, for most well-behaved distributions, the true figure is north of 95%. That conservatism is the cost of generality. When you need precision, use a distribution-specific tool. When you need a guarantee that works everywhere, Chebyshev’s inequality is the one to reach for.