What Is Benford’s Law? The First-Digit Rule Explained

Benford’s Law is a mathematical observation that in many real-world data sets, the leading digit is far more likely to be a 1 than a 9. Specifically, the number 1 appears as the first digit about 30% of the time, while 9 shows up as the first digit only about 4.6% of the time. This feels deeply counterintuitive. Most people assume each digit from 1 through 9 should appear roughly equally, at about 11% each. But across everything from city populations to earthquake depths to financial records, this lopsided pattern shows up again and again.

The Expected Frequencies

The probability of each leading digit follows a precise logarithmic formula. For any first digit d, the expected frequency is log₁₀(1 + 1/d). That yields these percentages:

1: 30.1%
2: 17.6%
3: 12.5%
4: 9.7%
5: 7.9%
6: 6.7%
7: 5.8%
8: 5.1%
9: 4.6%

The drop-off is steep. Digits 1 and 2 together account for nearly half of all leading digits. By the time you reach 7, 8, and 9, each one appears less than 6% of the time.

Why It Happens

The key insight is that numbers don’t grow in equal steps. They grow proportionally. Think about a stock price climbing from $100 to $200. It has to pass through every value starting with 1: $100, $110, $120, all the way to $199. That’s a 100% increase. But going from $900 to $1,000 (where the leading digit changes from 9 back to 1) requires only about an 11% increase. The number spends far more “time” with a leading digit of 1 than a leading digit of 9 simply because the range of values starting with 1 is proportionally wider.

This is why the formula uses logarithms. Logarithmic scales compress large ranges and stretch small ones, which mirrors how naturally growing quantities behave. Any data set that spans several orders of magnitude (ones, tens, hundreds, thousands) and grows or accumulates through multiplication rather than addition tends to follow this pattern. The data also needs to be free of artificial constraints. If numbers are generated by a natural or social process rather than assigned by human decision, they’re more likely to conform.

Where It Shows Up

The range of data sets that follow Benford’s Law is remarkably broad. Distances from Earth to known stars and exoplanets follow it. So do earthquake depths, crime statistics, internet traffic data, financial variables, and even the number of daily-recorded religious activities across communities. One study confirmed that cancer incidence rates (cases per 100,000 people, broken down by age and cancer type) obey the law because they span a wide range from single digits to thousands and arise from natural biological processes rather than human design.

City populations are a classic example. A study using the 5,000 most populated U.S. cities found the expected Benford distribution in their leading digits. Country-level tuberculosis incidence data follows the same pattern. Even collections of mathematical constants conform to the law.

Where It Does Not Apply

Benford’s Law breaks down when data is constrained to a narrow range. Human heights are a common example: adults mostly fall between about 150 and 200 centimeters, so the leading digit is almost always 1. Lottery numbers, drawn uniformly at random, have no reason to follow the pattern. Phone numbers, zip codes, and other assigned identifiers don’t follow it either, because they’re chosen by convention rather than generated by a natural process.

The law also fails when data clusters around a small band of values. Election precinct data illustrates this well. If most precincts have between 1,000 and 2,000 voters, with similar turnout and similar candidate support, the vote totals end up bunched in a narrow range. A candidate getting 75% to 85% of the vote in precincts of that size would see totals between roughly 525 and 1,360, never producing a leading digit of 2, 3, or 4. The conditions for Benford’s Law simply aren’t met. A data set generally needs to span multiple orders of magnitude, arise from a multiplicative or organic process, and not be bounded by human-imposed limits.

A Brief History

The pattern was first noticed in 1881 by the astronomer Simon Newcomb, who published a short two-page note in the American Journal of Mathematics after observing that the early pages of logarithm tables (which covered numbers starting with 1) were far more worn than later pages. He proposed the logarithmic formula but offered no data analysis to back it up.

About six decades later, in 1938, physicist and engineer Frank Benford independently made the same observation, apparently unaware of Newcomb’s earlier work. Benford went further, publishing a 22-page paper with extensive data from 20 different sources, including river areas, population figures, and street addresses. He called it the “law of anomalous numbers.” Despite Newcomb’s priority, the law carries Benford’s name, largely because his paper was far more detailed and widely read.

Catching Fraud and Fabricated Data

One of the most practical applications of Benford’s Law is spotting data that has been manipulated or invented. When people fabricate numbers, they tend to distribute leading digits more evenly than nature would, or they over-rely on certain “random-looking” digits. The resulting data set deviates from the expected Benford distribution in ways that statistical tests can flag.

Forensic accountants have used this approach for decades to screen financial statements and tax returns for irregularities. In scientific research, it works as an early-warning system. One study used data from papers retracted by Royal Society Publishing to test whether Benford’s Law could have caught problems earlier. The retracted papers, which contained inexplicable data duplications, deviated significantly from the expected digit distribution in both first and second digit positions. Meanwhile, a set of non-retracted papers from the same journals showed no significant deviation. The technique doesn’t prove fraud on its own, but it reliably flags data sets that warrant closer inspection.

Clinical trial data is another area where the approach has shown promise. Researchers can test whether summary statistics from two supposedly randomized groups show the natural variation you’d expect, or whether the numbers look suspiciously uniform. It works as a targeted check rather than a blanket detector.

The Election Controversy

After the 2020 U.S. presidential election, claims circulated that Benford’s Law proved vote totals had been manipulated. Mathematicians with decades of experience in Benford analysis pushed back firmly. The core problem is that precinct-level vote counts generally should not follow Benford’s Law in the first place. Precincts tend to be similar in size, voter turnout falls within a predictable range, and candidate support doesn’t vary wildly from one precinct to the next. This clustering of values violates the basic conditions the law requires.

Researchers have noted that even if precinct data happened to match Benford’s distribution at a broad scale, the test wouldn’t catch the kind of fraud people worry about. Adding a small fixed number of votes across precincts, or modifying just a few precincts, would almost never change leading digits enough to show up. Some analysts have proposed looking at second digits or converting vote totals to a smaller number base (like base 3 instead of base 10) to spread the data across more digit positions, but these are specialized techniques still being developed, not the simple “gotcha” test that social media suggested.