What Is a Cumulative Distribution Function: CDF Explained

A cumulative distribution function (CDF) tells you the probability that a random variable will take a value less than or equal to a specific number. Written as F(x) = P(X ≤ x), it answers a simple question: “What’s the chance that my outcome will be this value or lower?” The CDF always starts at 0 on the left and climbs to 1 on the right, capturing the entire range of possible outcomes as a single, rising curve.

How the CDF Works

Imagine you’re tracking daily temperatures in your city. The CDF at 75°F tells you the probability that any given day’s temperature will be 75°F or below. At very low temperatures (say, -50°F), the CDF is essentially 0 because it’s almost impossible to be that cold. At very high temperatures (say, 150°F), the CDF reaches 1 because virtually every day falls at or below that mark. Between those extremes, the function steadily rises.

This “always rising” behavior isn’t a coincidence. It’s a fundamental property. As you move to higher values, you’re including more possible outcomes, so the probability can only stay the same or increase. Mathematically, the CDF is monotone increasing, meaning it never dips back down. It also has hard limits: F(x) approaches 0 as x goes to negative infinity, and it approaches 1 as x goes to positive infinity.

Discrete vs. Continuous Variables

The CDF works for any type of random variable, but it looks different depending on whether the variable is discrete or continuous.

For a discrete variable (like the number of heads in 10 coin flips), the CDF is calculated by summing up individual probabilities. You add the probability of each possible outcome from the smallest value up to x. On a graph, this creates a staircase pattern. The function stays flat between possible values, then jumps up at each value the variable can actually take. These are sometimes called step plots: as soon as you hit a point on the x-axis where an outcome exists, you “step” up to the next probability level.

For a continuous variable (like height, weight, or temperature), the CDF is calculated by integrating the probability density function from negative infinity up to x. Instead of staircases, you get a smooth, S-shaped curve. This smoothness reflects the fact that a continuous variable can take on any real number within its range, not just specific values.

The Relationship Between CDF and PDF

The CDF and the probability density function (PDF) are two sides of the same coin, connected by calculus. If you have the PDF and want the CDF, you integrate. If you have the CDF and want the PDF, you differentiate. In plain terms: the PDF tells you how probability is spread across values (where outcomes are more or less concentrated), while the CDF tells you the running total of probability as you move from left to right.

Where the PDF is tall, the CDF climbs steeply, because a lot of probability is packed into that range. Where the PDF is nearly flat and close to zero, the CDF barely rises. This relationship makes it easy to move between the two representations depending on what question you’re trying to answer.

Finding Percentiles With the Inverse CDF

One of the most practical uses of the CDF is calculating percentiles. The CDF takes a value and returns a probability. The inverse CDF (called the quantile function) does the opposite: you give it a probability, and it returns the corresponding value.

To find the median of a distribution, you evaluate the inverse CDF at 0.5, which gives you the value where exactly half the probability falls below. Quartiles work the same way: the 25th percentile comes from plugging in 0.25, and the 75th percentile from 0.75. Any percentile you need follows the same pattern. If you want the 90th percentile, you evaluate the inverse CDF at 0.90. This is how standardized test scores, growth charts, and income brackets are all calculated behind the scenes.

The Normal Distribution CDF

The most widely recognized CDF belongs to the normal (bell curve) distribution. For the standard normal distribution (mean of 0, standard deviation of 1), the CDF has no simple closed-form formula. It’s defined as an integral that must be computed numerically, which is why statistics textbooks traditionally included lookup tables for it and why software tools handle the calculation today.

Despite requiring numerical computation, the normal CDF is used constantly. When someone says a test score is “in the 95th percentile,” they’re referencing a point on the normal CDF. The familiar rule that about 68% of data falls within one standard deviation of the mean, 95% within two, and 99.7% within three is really a set of CDF values at specific points.

Real-World Applications

CDFs show up wherever probability drives decisions. In reliability engineering, the exponential distribution’s CDF, F(t) = 1 – e^(-λt), models the probability that a component will fail by time t. Engineers use this to set maintenance schedules and warranty periods. If the CDF tells you there’s a 10% chance of failure within the first year, you can plan inspections and spare parts accordingly.

In quality control, CDFs help manufacturers determine what fraction of products fall outside acceptable tolerances. If a machine produces bolts with diameters that follow a normal distribution, the CDF instantly tells you what percentage of bolts will be too small or too large for the specification.

Survival analysis in medicine uses the complement of the CDF, often called the survival function, S(t) = 1 – F(t). Instead of asking “what’s the probability of failure by time t,” it asks “what’s the probability of surviving past time t.” Clinical trials use this framework to compare how long patients remain disease-free under different treatments.

Risk management in finance relies on CDFs to estimate the probability that losses will exceed a given threshold. The “value at risk” metric, widely used by banks and investment firms, is essentially a percentile read from the CDF of a portfolio’s return distribution.

Key Properties at a Glance

  • Range: The CDF always outputs a value between 0 and 1, inclusive.
  • Monotonically non-decreasing: It never goes down. Moving to a higher x value always gives you the same or higher probability.
  • Left limit is 0: As x approaches negative infinity, F(x) approaches 0.
  • Right limit is 1: As x approaches positive infinity, F(x) approaches 1.
  • Right-continuous: At any jump point (in the discrete case), the CDF includes the probability of that exact value, meaning it takes the value at the top of the step rather than the bottom.

These properties hold universally, whether you’re working with a simple coin flip, a normal distribution, or a complex real-world model. Any function that satisfies all of them qualifies as a valid CDF, and any valid CDF uniquely defines a probability distribution.