What Is a CDF? Definition, Graph, and PDF Comparison

A CDF, or cumulative distribution function, is a way of describing the probability that a variable will take on a value less than or equal to a specific number. If you’re looking at test scores and ask “what’s the chance someone scored 80 or below?”, the CDF gives you that answer. It’s one of the most fundamental tools in statistics and probability, used everywhere from medical research to quality control to finance.

How a CDF Works

The core idea is simple. For any value you pick on the x-axis, the CDF tells you the probability that your variable falls at or below that value. The result always lands between 0 and 1 (or 0% and 100%). At the far left, where x is extremely small, the CDF approaches 0, meaning there’s essentially no chance the variable is that low. At the far right, as x grows very large, the CDF approaches 1, meaning you’re virtually certain the variable falls somewhere at or below that point.

A CDF always increases or stays flat as you move left to right. It never decreases. This makes intuitive sense: the probability of being “at or below 50” can’t be less than the probability of being “at or below 40,” because the first category includes everything the second one does, plus more.

What a CDF Graph Looks Like

On a CDF plot, the x-axis shows the possible values of your variable (heights, temperatures, exam scores, whatever you’re measuring). The y-axis shows cumulative probability, running from 0 at the bottom to 1 at the top. The line or curve rises from left to right, starting near 0 and ending near 1.

Reading it is straightforward. Pick any point on the x-axis, trace straight up to the curve, then trace left to the y-axis. That y-value is your cumulative probability. For example, if you’re looking at property sizes and the CDF reads 0.6 at 8,000 square feet, that means roughly 60% of properties in your dataset are 8,000 square feet or smaller.

You can also read it in reverse to find percentiles. The median is wherever the curve crosses 0.5 on the y-axis. The 25th percentile is where it crosses 0.25, and the 75th percentile is where it crosses 0.75. This makes the CDF a quick visual tool for understanding how data is distributed.

Discrete vs. Continuous Variables

The shape of a CDF depends on what kind of variable you’re dealing with. For continuous variables like height or temperature, the CDF is a smooth curve. A classic example is the standard normal (bell curve) distribution, whose CDF is the familiar S-shaped curve known as a sigmoid. For the standard normal, the CDF at any point x equals one-half times one plus the error function of x divided by the square root of two. In practice, nobody calculates this by hand. Software handles it.

For discrete variables, where only certain values are possible (like the number of heads when flipping three coins), the CDF looks like a staircase. It’s flat between possible values and jumps up at each one. Flip three coins and count heads: the CDF jumps to 1/8 at zero heads, to 1/2 at one head, to 7/8 at two heads, and to 1 at three heads. Between those values, the probability stays flat because you can’t get 1.5 heads.

CDF vs. PDF

If you’ve encountered the term PDF (probability density function), it helps to understand how the two relate. A PDF shows the relative likelihood of each specific value for a continuous variable. It’s the familiar bell-shaped curve for normally distributed data. The CDF is the running total of that likelihood as you move from left to right.

Mathematically, the CDF is the integral (the accumulated area under the curve) of the PDF. If you integrate the PDF from negative infinity up to some value x, you get the CDF at that point. Going the other direction, if you take the derivative of the CDF, you get the PDF back. This two-way relationship means you can always convert between the two. In practical terms, the PDF tells you where values are concentrated, while the CDF tells you the probability of falling below any given threshold.

How CDFs Are Used in Practice

Hypothesis Testing and P-Values

Every time a researcher calculates a p-value, they’re using a CDF. The process works like this: you compute a test statistic (like a z-score) from your data, then use the CDF of the appropriate distribution to find the probability of seeing a result that extreme under the assumption that nothing interesting is happening (the null hypothesis). For a one-tailed test, the p-value is simply 1 minus the CDF evaluated at your test statistic. For a two-tailed test, you double that value to account for both directions. A z-score of 2.58, for instance, corresponds to a CDF value of about 0.9951, leaving a p-value of roughly 0.49% in the upper tail.

Survival Analysis in Medicine

In clinical trials, researchers often want to know the probability that a patient survives beyond a certain time point. The survival function is essentially 1 minus the CDF of the time-to-event variable. Kaplan-Meier curves, which are standard in cancer research and drug trials, are step-function estimates of this survival function. Each vertical drop on the curve represents an event (like a death or disease recurrence), and the curve gives a running estimate of the probability of surviving past any given time.

Dose-Response Curves

Pharmacology relies on cumulative dose-response curves to understand how increasing doses of a drug affect a biological response. These follow the same S-shaped pattern as a CDF. At low doses, there’s little effect. At some middle range, the effect increases steeply. At high doses, the response plateaus. This framework has been applied to drugs ranging from stimulants to sedatives to pain medications, and it dramatically reduces the time needed to characterize how a compound behaves across a wide range of doses.

Calculating CDFs in Software

You’ll rarely compute a CDF by hand. In Python, the SciPy library provides CDF functions for virtually every common distribution. For a normal distribution, you’d create a distribution object and call its .cdf() method. For example, creating a uniform distribution between -0.5 and 0.5 and evaluating its CDF at 0.25 returns 0.75, meaning 75% of the distribution falls at or below that point. In R, the equivalent functions follow a naming pattern: pnorm() for the normal CDF, pbinom() for the binomial, and so on, where the “p” stands for probability. Spreadsheet tools like Excel use functions like NORM.S.DIST(z, TRUE), where setting the second argument to TRUE returns the cumulative distribution function rather than the density.

In R, you can also build an empirical CDF directly from data using the ecdf() function, which constructs a step function from your actual observations rather than assuming any theoretical distribution. This is useful for exploratory analysis when you want to see how your data is distributed without making assumptions about its shape.