What Is Bimodal Distribution and Why Does It Matter?

A bimodal distribution is a pattern in data that has two distinct peaks instead of one. Picture the classic bell curve you’ve seen in textbooks: that’s a unimodal distribution, with a single hump in the middle. A bimodal distribution has two of those humps, separated by a valley, indicating that two different groups or patterns exist within the same dataset.

How It Looks and What It Means

On a graph, a bimodal distribution shows two high points (called modes) with a dip between them. The taller peak is called the major mode, and the shorter one is the minor mode, though the two peaks can also be roughly equal in height. The valley between them is called the antimode. Each peak represents a cluster of values that show up frequently in the data, while the antimode represents values that are relatively uncommon.

The key insight behind bimodality is that your data likely contains two overlapping subgroups with different characteristics. Instead of one population spread evenly around a single average, you’re looking at two populations stacked on top of each other. Rush-hour traffic is an intuitive example: if you plot the number of cars on a highway by time of day, you’ll see one peak during the morning commute and another during the evening commute, with a dip in between. That pattern emerges because two distinct behaviors (going to work, coming home from work) are mixed into the same dataset.

Why the Average Can Mislead You

Bimodal distributions create a trap for anyone relying on simple summary statistics. In a bimodal dataset, the mean and median tend to land in the valley between the two peaks, right where the fewest data points actually sit. Reporting that average as “typical” would be actively misleading, because almost nobody in the dataset falls near that value. The mode lands at the tallest peak, but that only captures one of the two clusters.

Imagine a company where half the employees earn around $40,000 and the other half earn around $90,000. The average salary would be roughly $65,000, a number that describes essentially no one. If you only looked at the mean, you’d miss the entire story. This is why identifying bimodality matters: it tells you that a single number can’t summarize what’s going on, and that you need to look at each subgroup separately.

What Causes Two Peaks to Form

Bimodal distributions almost always arise because two distinct processes, populations, or conditions are being measured together. Statisticians call these “latent subpopulations,” meaning hidden groups within the data that become visible only when you plot the full distribution. Mixture models are designed specifically for this situation. They assume the overall sample is composed of several underlying subpopulations, each with its own center and spread.

Sometimes the subgroups are obvious in hindsight. Sometimes they reveal something unexpected. In substance use research, for instance, survey data on how often people drink or use drugs frequently shows a bimodal pattern: one cluster of people with low or no use, and another cluster with high use. Treating that data as a single bell curve would obscure the very distinction researchers are trying to understand.

The Height Example (and Why It’s Tricky)

Human height is often cited as a textbook example of bimodality: men average about 69.3 inches and women about 64.1 inches, so combining them should produce two peaks, right? It turns out to be more complicated than that. A study using national health survey data found that the 5.2-inch gap between male and female averages isn’t quite large enough to create a visibly bimodal curve when you mix the two groups together. The overlap between the two bell curves is too great.

For two normal distributions with similar spread to become clearly bimodal when combined, the gap between their averages needs to exceed roughly 1.12 times the sum of their individual standard deviations. With male and female heights, that threshold is about 6.05 inches, slightly more than the actual gap. If you artificially added just one extra inch of separation, the mixture would technically become bimodal, but the dip between peaks would be only 0.16% below the lower peak, essentially invisible. Human height works as a conceptual example of how mixed groups can create two peaks, but in practice, the two humps blur into a single wide mound.

Bimodal Patterns in Health and Disease

In medicine, bimodal distributions show up in the age at which certain diseases strike, and recognizing them has real diagnostic value. Hodgkin lymphoma is a well-known example. It has an early peak around age 20 and a second peak after age 70. These aren’t random. The early peak is thought to result from delayed exposure to common childhood infections, which disrupts normal immune development and leaves young adults vulnerable. The later peak appears connected to the immune system losing control of a latent viral infection (Epstein-Barr virus) as it weakens with age. Older patients with Hodgkin lymphoma test positive for the virus at much higher rates, supporting this theory.

Other cancers with bimodal age patterns include acute lymphoblastic leukemia, osteosarcoma, and craniopharyngioma. In each case, the two peaks point to fundamentally different disease mechanisms operating at different life stages.

High-risk HPV infection follows a similar bimodal curve. In unvaccinated populations, infection rates peak in young adulthood at 25 to 40%, drop to a baseline of 5 to 15% during middle age, and then show a smaller second peak of 15 to 20% after age 50. The second wave of infections corresponds to a later peak in precancerous cervical changes around age 61, roughly six years after the second infection spike. This pattern is one reason screening programs are being reconsidered for older women rather than stopping at a fixed age cutoff.

How Bimodality Is Detected

Eyeballing a histogram can suggest bimodality, but visual impressions are unreliable, especially with small samples or noisy data. Several formal tests exist. The Hartigan dip statistic measures whether a distribution has a statistically significant dip between two peaks compared to a uniform distribution. It’s one of the most commonly used tests for bimodality.

Another approach uses Sarle’s bimodality coefficient, which combines two properties of the distribution: its skewness (how lopsided it is) and its kurtosis (how heavy its tails are). The formula produces a single number, and distributions scoring above 0.555 are considered bimodal. Scores below that threshold suggest the data is unimodal. This coefficient is easy to calculate from basic summary statistics, making it a practical first check before running more complex analyses.

Bimodal vs. Multimodal

Bimodal is a specific case of the broader category “multimodal,” which simply means more than one peak. A distribution with three peaks is trimodal; four or more peaks is just called multimodal. The logic is the same in every case: multiple peaks suggest multiple underlying groups or processes. Bimodal distributions get special attention because they’re the most common multimodal pattern and often the easiest to interpret, since two subgroups are simpler to reason about than three or four.

When you encounter a bimodal distribution in your own data, the most productive next step is usually to figure out what variable splits the data into those two groups. Once you identify it (gender, age bracket, treatment status, geographic region), you can analyze each subgroup on its own terms rather than forcing a single summary onto a dataset that clearly contains two different stories.