What Is a Negative Binomial Distribution and When to Use It

The negative binomial distribution is a probability distribution that counts the number of failures you experience before reaching a set number of successes in repeated, independent trials. If you flip a coin and want to know how many tails you’ll see before getting 5 heads, the negative binomial distribution gives you the probability of each possible answer. It shows up across statistics, data science, epidemiology, and genomics as a flexible tool for modeling count data, especially when the data is more spread out than simpler models can handle.

How the Distribution Works

Imagine you’re running an experiment where each attempt either succeeds (with probability p) or fails (with probability 1 − p). You keep going until you’ve racked up exactly r successes. The negative binomial distribution tells you the probability of seeing exactly x failures along the way.

The two parameters that define the distribution are:

r: the number of successes you’re waiting for (any positive integer: 1, 2, 3, …)
p: the probability of success on any single trial (between 0 and 1)

A few assumptions must hold for the model to apply. Each trial is independent, meaning the outcome of one doesn’t influence the next. The probability of success stays constant across all trials. And each trial has only two possible outcomes: success or failure.

When r equals 1, the negative binomial simplifies into the geometric distribution, which counts failures before a single success. So the geometric distribution is really just a special case.

Mean and Variance

The expected number of failures before reaching r successes is r(1 − p) / p. If you have a 50% chance of success on each trial and need 4 successes, you’d expect to see 4 failures on average. Lower values of p push the expected failure count higher, which makes intuitive sense: the harder each trial is, the more failures you’ll accumulate.

The variance, which measures how spread out the failure counts tend to be, is r(1 − p) / p². Notice the variance formula has p² in the denominator while the mean has just p. This means the variance is always larger than the mean (since p is less than 1, dividing by p² produces a bigger number than dividing by p). That built-in property is one of the main reasons statisticians reach for this distribution.

Why It Outperforms the Poisson Distribution

The Poisson distribution is the go-to model for count data, things like the number of emails you get per hour or the number of accidents at an intersection per month. But the Poisson has a strict constraint: its mean and variance are equal. Real-world count data frequently violates this. Hospital readmissions, insurance claims, disease cases, and website visits often show more variability than the Poisson allows. Statisticians call this overdispersion.

When overdispersion is present but you force a Poisson model onto the data, the standard errors on your estimates shrink artificially. This can make you conclude that a factor is statistically significant when it actually isn’t. The negative binomial distribution includes an extra parameter (the dispersion parameter) that absorbs that additional variability, producing more honest estimates and more reliable conclusions. A study in the epidemiological literature found that overdispersion is frequent in recurrent-type phenomena, and when it occurs, the negative binomial is more appropriate than the Poisson.

There’s also an elegant mathematical connection between the two. If the rate of a Poisson process itself varies randomly according to a gamma distribution (a common continuous distribution), the resulting counts follow a negative binomial distribution. This “gamma-Poisson mixture” interpretation is why the negative binomial is sometimes called a Poisson distribution with a random rate. It naturally captures situations where the underlying rate of events isn’t fixed but fluctuates from one observation to the next.

Applications in Genomics

One of the most prominent modern uses of the negative binomial distribution is in analyzing gene expression data from RNA sequencing (RNA-seq). When researchers measure how active a gene is across different biological samples, they count the number of short DNA reads mapping to each gene. The technical noise in these counts is close to Poisson, but samples from different biological replicates introduce extra variability. Genes are more active in some individuals than others, creating overdispersion that the Poisson can’t accommodate.

The negative binomial model handles this naturally. Its dispersion parameter captures the biological variability on top of the technical noise. Several widely used software packages for detecting differences in gene activity between experimental conditions, including edgeR, DESeq, and NBPSeq, all use the negative binomial as their core statistical model. It has become the standard framework for this type of analysis in bioinformatics.

Applications in Epidemiology

Epidemiologists use the negative binomial distribution to model how infectious diseases spread, particularly when transmission is uneven. During an outbreak, most infected people may pass the disease to very few others, while a small number of “superspreaders” generate large clusters. This pattern creates overdispersed transmission data.

Researchers have applied the negative binomial to outbreaks of SARS and COVID-19, fitting it to data on secondary cases (how many people each infected person goes on to infect). The dispersion parameter, often called k in this context, quantifies how concentrated the transmission is. A small k means a few individuals drive most of the spread. During COVID-19, researchers used negative binomial tail probabilities with small k values to estimate the likelihood of superspreading events, information that guided public health responses around gatherings and indoor settings.

Working With It in Code

If you need to calculate negative binomial probabilities or generate random samples, the major scientific computing libraries have built-in support. In Python’s SciPy library, the function scipy.stats.nbinom takes two shape parameters: n (the number of successes, equivalent to r) and p (the probability of success). From there you can compute the probability of any specific outcome with nbinom.pmf(k, n, p), cumulative probabilities with nbinom.cdf(k, n, p), or generate random samples with nbinom.rvs(n, p, size=...).

In R, the equivalent functions are dnbinom for the probability mass function, pnbinom for cumulative probabilities, and rnbinom for random samples. Both languages parameterize the distribution the same way, counting failures before the nth success, so translating between them is straightforward. The stats method in SciPy can return the mean and variance directly, which is useful for quick sanity checks on your parameters.

When to Use It

The negative binomial is the right choice when your data consists of non-negative integers (counts) and shows more spread than the mean alone would predict. Common signals include a long right tail, where most values cluster near zero but a few are much larger, and a variance-to-mean ratio noticeably above 1. Count data where the majority of observations are concentrated toward lower values, with occasional high counts pulling the distribution rightward, is a classic candidate.

It’s also worth considering when your counts arise from a process where the underlying rate plausibly varies. Customer purchase counts vary because different customers shop at different frequencies. Insurance claim counts vary because different policyholders carry different risk levels. In both cases, the gamma-Poisson mixture interpretation provides a natural justification for the model, not just a statistical fix for overdispersion but a reflection of genuine heterogeneity in the data-generating process.