The gamma distribution is a continuous probability distribution that models the time or amount you need to wait until a certain number of events happen. It only produces positive values, making it a natural fit for things like rainfall totals, insurance claims, and wait times. If you’ve encountered the exponential distribution (which models the time until one event), the gamma distribution is its generalization: it handles the time until two, three, or any number of events.
How the Shape and Scale Parameters Work
The gamma distribution is controlled by two parameters. The shape parameter (often written as α or k) determines the basic form of the curve. The scale parameter (often written as β or θ) stretches or compresses the distribution along the horizontal axis.
When the shape parameter is less than 1, the distribution drops steeply from a high point at zero, resembling a sharp decay. When it equals 1, you get the exponential distribution exactly. As the shape parameter increases past 1, a hump appears and gradually moves to the right, and the distribution starts looking more symmetric. By the time α reaches about 10 or higher, the curve closely resembles a normal (bell-shaped) distribution.
The scale parameter controls how spread out the values are. A larger scale stretches the distribution toward higher values without changing its basic shape. Think of shape as deciding whether the curve looks like a sharp spike or a gentle hill, and scale as deciding how far along the number line that hill sits.
One common source of confusion: there are two standard ways to write the gamma distribution. Some textbooks use a “rate” parameter (β), which is the inverse of the scale. So if you see two different-looking formulas for the gamma distribution, they’re usually describing the same thing with the rate flipped. The shape-scale version divides by the scale in the exponent, while the shape-rate version multiplies by the rate.
Mean, Variance, and Skewness
The summary statistics for a gamma distribution follow simple formulas. Using the shape (α) and scale (θ) version:
- Mean: α × θ. If you’re waiting for 3 events and each takes an average of 5 minutes, the expected total wait is 15 minutes.
- Variance: α × θ². The spread grows with both more events and a larger scale.
- Skewness: 2 / √α. This is the key insight: the distribution is always right-skewed (its tail stretches toward large values), but that skew shrinks as the shape parameter grows. With α = 1, skewness is 2. With α = 100, skewness drops to 0.2, nearly symmetric.
Connection to Other Distributions
The gamma distribution sits at the center of a family of related distributions. When the shape parameter equals 1, the gamma distribution is identical to the exponential distribution. This makes sense: the exponential models the wait for a single event, and a gamma with shape 1 does the same thing.
The chi-squared distribution, widely used in hypothesis testing, is a gamma distribution where the shape is half the degrees of freedom and the scale is 2. So a chi-squared distribution with 6 degrees of freedom is simply a gamma with shape 3 and scale 2.
There’s also a useful addition rule. If you add two independent gamma random variables that share the same scale parameter, the result is another gamma distribution whose shape parameter is the sum of the two individual shapes. This property makes the gamma distribution especially convenient in modeling, because combining processes that each follow a gamma distribution gives you another gamma distribution.
The Gamma Function Behind the Name
The distribution gets its name from the gamma function, which appears in its formula as a normalizing constant (ensuring the total probability equals 1). The gamma function extends the concept of factorials to non-integer numbers. For any positive integer n, the gamma function of n equals (n-1) factorial. So the gamma function of 5 is 4 × 3 × 2 × 1 = 24.
What makes the gamma function powerful is that it also works for values like 2.5 or 0.7, where a regular factorial doesn’t apply. This allows the gamma distribution to use any positive real number as its shape parameter, not just whole numbers.
Where the Gamma Distribution Shows Up
The gamma distribution appears wherever you’re measuring a positive, continuous quantity that tends to cluster near a lower value with occasional large outliers. Rainfall modeling is one of the most common applications: daily rainfall amounts fit a gamma distribution well because most days with rain see modest amounts, while a few days bring heavy downpours. Hydrologists use gamma-based models to forecast floods and manage reservoir operations.
In insurance, claim sizes often follow a gamma distribution. Most claims are small to moderate, but the occasional large claim pulls the tail to the right. Reliability engineering uses it to model the lifetime of components, especially when failure requires the accumulation of multiple small damage events. In healthcare, the time a patient spends in a hospital or the cost of treatment often fits a gamma distribution for similar reasons: most stays are short, but a few are very long.
In Bayesian statistics, the gamma distribution plays a special structural role. When you’re modeling count data with a Poisson distribution and need to express prior beliefs about the event rate, a gamma prior is the “conjugate” choice. This means the math works out cleanly: your updated belief (the posterior distribution) after seeing data is also a gamma distribution, just with adjusted parameters. This makes calculations tractable and is one reason the gamma distribution is so prominent in Bayesian modeling.
Why Not Just Use a Normal Distribution
A normal distribution allows negative values and is perfectly symmetric. For many real-world quantities, both of those properties are wrong. You can’t have negative rainfall, negative wait times, or negative insurance claims. And these quantities are rarely symmetric: they pile up near the low end with a long right tail.
The gamma distribution handles both issues naturally. It’s defined only for positive values, and its built-in right skew matches how these quantities actually behave. As the shape parameter grows large, the gamma distribution converges toward a normal distribution anyway, so you lose nothing by starting with the more flexible model.

