How to Normalize a Histogram: Density vs. Frequency

Normalizing a histogram means adjusting the vertical axis so that bar heights represent proportions or probability density instead of raw counts. The most common approach divides each bin’s count by the total number of observations and the bin width, making the total area under the histogram equal to 1. This lets you compare datasets of different sizes and overlay probability distributions directly onto your histogram.

What Normalization Actually Does

A standard histogram shows how many data points fall into each bin. That’s useful on its own, but it creates problems when you want to compare two datasets with different sample sizes, or when you want to overlay a theoretical distribution curve. A dataset with 10,000 points will have much taller bars than one with 500, even if they follow the same pattern.

Normalization rescales those bar heights so the histogram represents relative frequency or probability density. Once normalized, two histograms with wildly different sample sizes become directly comparable because they’re both on the same scale.

Two Common Ways to Normalize

Relative Frequency

The simplest method divides each bin’s count by the total number of observations. If you have 1,000 data points and 150 fall in one bin, that bin’s height becomes 0.15 (or 15%). The heights of all bars sum to 1. This gives you a quick sense of what proportion of your data lands in each range, and it’s sometimes called a relative histogram.

Probability Density

The more rigorous method divides each bin’s count by both the total number of observations and the bin width. The formula for a given bin is:

normalized height = (count in bin) / (total observations × bin width)

With this approach, the area of each bar (height × width) represents the proportion of data in that bin, and the total area under the histogram equals 1. This is important because it makes the histogram behave like a probability density function, which means you can directly overlay a smooth distribution curve (like a normal distribution) on top of it and visually compare them.

Why does bin width matter here? If your bins are wide, they naturally capture more data points. Dividing by the bin width corrects for this, so the shape of the histogram reflects the true density of data rather than an artifact of how you chose to bin it. This is especially critical when you use unequal bin widths, where simply dividing by total count would distort the picture.

Density vs. Relative Frequency: Which to Use

Use relative frequency when you want a quick, intuitive summary. The bar heights are easy to interpret: “20% of values fall between 5 and 10.” The bars sum to 1, which feels natural.

Use probability density when you’re fitting a distribution to your data or comparing your histogram against a theoretical curve. The bars won’t sum to 1 (unless every bin happens to be exactly 1 unit wide), but the total area will. This is the standard choice in statistics and data science when the goal is estimating the underlying probability distribution.

A common point of confusion: with density normalization, individual bar heights can exceed 1. That’s perfectly normal. If your data is tightly clustered and your bins are narrow (say, 0.1 units wide), a bar might have a height of 3 or 4. What matters is that height × width for each bar gives the correct proportion, and all those areas add up to 1.

How to Normalize in Python

Both NumPy and Matplotlib make density normalization straightforward with a single parameter.

In NumPy, pass density=True to the histogram function:

counts, bin_edges = numpy.histogram(data, bins=30, density=True)

This returns the probability density for each bin, normalized so the integral over all bins equals 1. NumPy’s documentation explicitly notes that the sum of the returned values will not equal 1 unless you happen to use bins of unit width. That’s the density vs. relative frequency distinction in action.

In Matplotlib, the same parameter works inside plt.hist:

plt.hist(data, bins=30, density=True)

If you want relative frequency instead (bar heights summing to 1), you can normalize manually. Compute the histogram with raw counts, then divide each count by the total number of data points:

counts, bin_edges = numpy.histogram(data, bins=30)
relative = counts / counts.sum()

By default, NumPy uses 10 bins. You can set the number explicitly, pass an array of custom bin edges for non-uniform widths, or use a string like 'auto' to let NumPy pick an optimal bin width algorithmically.

How to Normalize in R

In base R, the hist() function accepts a freq argument. Setting freq = FALSE (or equivalently, prob = TRUE) switches the y-axis from counts to density:

hist(data, freq = FALSE)

In ggplot2, you map the y aesthetic to density using after_stat(density):

ggplot(df, aes(x = value, y = after_stat(density))) + geom_histogram(binwidth = 1)

This scales the histogram so the bars integrate to 1, which is useful when comparing distributions with very different sample sizes. You can then add a density curve on top with geom_density() and the scales will match perfectly.

Don’t Confuse This With Data Normalization

The term “normalization” gets used differently depending on context. Normalizing a histogram is about rescaling bar heights to show density or proportions. Data normalization (sometimes called feature normalization in machine learning) is about rescaling individual data values before building the histogram at all.

The most common data normalization technique is min-max scaling, which transforms every value into a range between 0 and 1. Standardization is a related but distinct process: you subtract the mean and divide by the standard deviation to produce z-scores, giving your data a mean of 0 and a standard deviation of 1.

These transformations change the shape and position of your data on the x-axis. Histogram normalization only changes the y-axis. You can apply both: standardize your data first, then plot a density-normalized histogram of the standardized values. They answer different questions and operate independently.

Choosing the Right Bin Width

Normalization corrects for bin width, but choosing good bins still matters for readability. Too few bins and you lose the shape of the distribution. Too many and the histogram becomes noisy, with erratic spikes from bin to bin.

A practical starting point is the square root of your sample size. For 400 data points, try 20 bins. For more principled approaches, NumPy and R both offer automatic bin selection methods (Sturges’ rule, Freedman-Diaconis, and others) that balance detail against noise based on your data’s spread and sample size. When your bins have unequal widths, density normalization becomes essential rather than optional, because it’s the only method that prevents wider bins from visually dominating the plot.