How to Find the Frequency Distribution in Statistics

Finding a frequency distribution means counting how many times each value (or range of values) appears in your dataset, then organizing those counts into a table or chart. It’s one of the first things you do when summarizing raw data, and the process is straightforward once you know the steps. Whether you’re working with a small dataset by hand or thousands of rows in a spreadsheet, the core logic is the same.

What a Frequency Distribution Actually Shows

A frequency distribution is a count of scores that fall into specific categories. If you surveyed 50 people about their favorite color, the frequency distribution would tell you how many picked blue, how many picked red, and so on. If you recorded 200 test scores, it would tell you how many students scored in the 60s, 70s, 80s, and 90s.

The type of data you have determines how you set up the distribution. For discrete or categorical data (shoe sizes, letter grades, yes/no responses), each unique value gets its own row. For continuous data (heights, incomes, temperatures), you group values into ranges called class intervals or “bins” because listing every unique measurement would make the table unreadable. The grouped version is what most people picture when they think of a frequency distribution.

How to Build an Ungrouped Frequency Table

Start here if your data has a manageable number of distinct values, like survey responses or dice rolls.

  • List every unique value. Create a table with two columns. Label the first column with your variable name and the second “Frequency.” Enter each unique value in the first column.
  • Tally the occurrences. Go through your raw data and make a tick mark next to the matching value each time it appears. For large datasets, adding a separate “Tally” column keeps things organized.
  • Count and record. Convert your tally marks into numbers and enter them in the Frequency column. The sum of all frequencies should equal your total number of observations. If it doesn’t, you missed something.

That’s the entire process for ungrouped data. A dataset of 30 students’ letter grades might produce a five-row table: A appeared 6 times, B appeared 11 times, C appeared 8 times, D appeared 4 times, F appeared 1 time.

How to Build a Grouped Frequency Table

When your data is continuous or has dozens of unique values, you need to group it into intervals. This requires a few extra decisions.

Choose the Number of Bins

A common guideline is to use between 5 and 20 bins. Too few and you lose the pattern in your data; too many and the table becomes noisy. For a more precise starting point, Sturges’ Rule gives you the number of bins as 1 + log₂(n), where n is the number of data points. For 100 observations, that’s about 7.6, so you’d round to 8 bins. For 1,000 observations, it’s roughly 11.

Sturges’ Rule works well for data that’s roughly bell-shaped. If your data is heavily skewed or has multiple peaks, you may need more bins to capture the shape accurately.

Calculate the Bin Width

Find your data’s range (highest value minus lowest value), then divide by the number of bins. If test scores range from 52 to 98 and you want 8 bins, the width is (98 – 52) / 8 = 5.75. Round up to 6 for clean intervals. Your bins would be 52–57, 58–63, 64–69, and so on.

Count the Frequencies

Go through each data point and place it into the correct bin. Every value must land in exactly one bin, with no overlaps and no gaps. When you’re done, the frequency column shows how your data is distributed across the range.

Relative and Cumulative Frequencies

A basic frequency table tells you raw counts, but two additional columns can make the distribution far more useful.

Relative frequency converts each count into a proportion. Divide each frequency by the total number of observations. If 15 out of 100 scores fall in the 80–89 bin, the relative frequency is 0.15, or 15%. This lets you compare distributions from datasets of different sizes, something raw counts can’t do.

Cumulative frequency tells you how many observations fall at or below a given value. You calculate it by adding each row’s frequency to the running total of all previous rows. If the first three bins have frequencies of 4, 12, and 23, the cumulative frequencies are 4, 16, and 39. This is especially useful for answering questions like “what percentage of students scored below 70?” You just read the cumulative relative frequency at the 70 mark.

What the Shape of the Distribution Tells You

Once you’ve built your frequency distribution, its shape reveals important characteristics of your data. A symmetric distribution, where the left and right sides mirror each other, has a skewness value near zero. Most values cluster around the center, and the tails taper evenly.

A positive skew means the right tail is longer. The bulk of your values sit to the left (lower end), with a few unusually high values stretching the distribution to the right. Income data is a classic example. A negative skew is the opposite: most values are high, with a long tail reaching toward lower values. Think of an easy exam where most students scored well but a few scored very low.

A bimodal distribution has two distinct peaks, which often signals that your data contains two separate groups. If you plotted the heights of a mixed adult population without separating by sex, you’d likely see two humps rather than one.

How to Visualize It

The three most common charts for frequency distributions each serve a different purpose.

A histogram uses vertical bars to represent the frequency of each bin. The bars touch each other, signaling that the data is continuous, with each bar’s height showing how many observations fall in that range. This is the default visualization for grouped frequency distributions.

A frequency polygon connects points plotted at the midpoint of each bin. It’s useful when you want to compare two distributions on the same graph, since overlapping lines are easier to read than overlapping bars.

An ogive (pronounced “oh-jive”) plots cumulative frequencies. The line always rises from left to right, and you can read off the percentage of data below any given value. For categorical data, bar charts with gaps between the bars or pie charts are more appropriate, since there’s no continuous scale connecting the categories.

Finding Frequency Distributions in Excel

Excel has a built-in FREQUENCY function designed for this. It takes two inputs: your data range and a set of bin boundaries you define.

First, decide on your bins and list the upper boundary of each bin in a column. If your bins are 0–10, 11–20, 21–30, enter 10, 20, 30 in separate cells. Then, in an adjacent column, type =FREQUENCY(data_array, bins_array), where data_array is the range containing your raw numbers and bins_array is the range with your bin boundaries. In Microsoft 365, press Enter and the results spill automatically into multiple cells. In older versions, you need to select the output range first (one cell more than the number of bins), type the formula, and press Ctrl+Shift+Enter to enter it as an array formula.

The function returns one more value than the number of bins you specified. That extra value counts anything above your highest bin boundary, so you always know if data points fell outside your expected range.

For ungrouped counts, a simpler approach is the COUNTIF function. Use =COUNTIF(range, value) for each unique value, and you’ve got a frequency table without dealing with array formulas.

Finding Frequency Distributions in Python

The pandas library makes this quick. For ungrouped data, call .value_counts() on any column (a pandas Series), and it returns every unique value paired with its count, sorted from most to least frequent.

To get relative frequencies instead of raw counts, add normalize=True. The output will show proportions that sum to 1.

For grouped distributions, pass the bins parameter to .value_counts(bins=8), and pandas will automatically divide the range into 8 equal-width intervals and count observations in each. Under the hood this uses pd.cut(), which you can also call directly for more control over bin edges. If you want bins based on percentiles rather than equal widths, pd.qcut() divides data so each bin contains roughly the same number of observations.

For visualization, df['column'].plot.hist(bins=10) produces a histogram in one line using matplotlib. You can adjust the number of bins, add axis labels, and overlay a density curve with density=True to see the distribution’s shape more clearly.