How to Find Expected Frequency: Formula & Steps

Expected frequency is calculated by multiplying the total number of trials (or observations) by the probability of each outcome. The core formula is: Expected Frequency = n × p, where n is your total sample size and p is the probability of the outcome you’re interested in. This single idea underpins several common applications in statistics, from testing whether a die is fair to analyzing survey data in a contingency table.

The Basic Formula

The foundation is straightforward. If you know how likely something is to happen (its probability) and how many times the experiment or observation takes place (your sample size), you multiply the two together to get the expected frequency.

Expected Frequency (E) = n × p

Say you roll a die 500 times and want to know how often you’d expect to see a 3. A fair die gives each face a probability of 1/6. So the expected frequency for rolling a 3 is 500 × (1/6), which is about 83.33. You’d use that same calculation for each face of the die, and since all six probabilities are equal, every expected frequency comes out to roughly 83.33. Your actual (observed) results will almost certainly differ from this, which is the whole point of comparing expected and observed frequencies in a statistical test.

The probability you plug in can come from two places: either from a theoretical model (like the 1/6 chance for each side of a fair die) or from prior data you already have about a population. If historical records show that 30% of customers choose a particular product, your expected frequency for that product in a sample of 200 customers would be 200 × 0.30 = 60.

Expected Frequency in a Contingency Table

When you’re working with a two-way table (also called a contingency table), the formula looks slightly different because you’re calculating expected frequencies for each cell rather than for a single category. The rule is:

Expected Frequency = (Row Total × Column Total) / Grand Total

This formula assumes the two variables in your table are independent of each other. It essentially asks: if there were no relationship between the row variable and the column variable, how many observations would you expect in this cell?

Here’s a concrete example. Suppose researchers are studying whether body weight is associated with kidney stones. They collect data on 6,055 patients and organize it into a table with two rows (obese and non-obese) and two columns (kidney stones: yes or no). The row total for obese patients is 2,119, the column total for “yes” kidney stones is 469, and the grand total is 6,055. The expected frequency for the “obese + kidney stones” cell would be (2,119 × 469) / 6,055, which equals about 164.1. You’d repeat this for every cell in the table, then compare each expected value to the observed count.

Working Through a Larger Table

The same logic scales to bigger tables. If you have a 3×4 contingency table, you calculate the expected frequency for all 12 cells using the same row-total-times-column-total-divided-by-grand-total approach. Each cell gets its own expected value, and every one of them is calculated independently using the marginal totals.

Expected Frequency in a Goodness-of-Fit Test

A goodness-of-fit test checks whether your observed data matches a specific distribution you’d expect. You start with a hypothesis about what the probabilities should be, then calculate expected frequencies using E = n × p for each category.

For instance, imagine a company claims its candy bags contain 20% red, 30% blue, 25% green, and 25% yellow candies. You buy a bag with 200 candies and want to test that claim. Your expected frequencies would be:

  • Red: 200 × 0.20 = 40
  • Blue: 200 × 0.30 = 60
  • Green: 200 × 0.25 = 50
  • Yellow: 200 × 0.25 = 50

You then count the actual candies of each color and compare those observed counts to the expected ones. The chi-square statistic measures how far off the observed values are from what you expected. A large gap suggests the company’s claimed distribution doesn’t match reality.

Expected Frequency in Genetics

In population genetics, expected frequencies work the same way but use a specific model called Hardy-Weinberg equilibrium. If a gene has two versions (alleles) with frequencies p and q in the population, the expected genotype frequencies are p², 2pq, and q². These three values always add up to 1, just like all probabilities in any complete distribution.

So if allele A has a frequency of 0.6 (p = 0.6) and allele a has a frequency of 0.4 (q = 0.4), you’d expect 36% of the population to carry the AA genotype (0.6² = 0.36), 48% to carry Aa (2 × 0.6 × 0.4 = 0.48), and 16% to carry aa (0.4² = 0.16). Researchers compare these expected frequencies to observed genotype counts to determine whether a population is evolving or whether some force like natural selection is shifting allele frequencies away from what random mating alone would predict.

Step-by-Step Process

Regardless of the context, finding expected frequency follows a consistent set of steps:

  • Identify your total sample size (n). This is the total number of observations, trials, or individuals in your dataset.
  • Determine the probability for each category (p). This comes from your hypothesis, a theoretical model, or your table’s marginal totals.
  • Multiply n by p. For a simple one-way test, that’s all there is to it. For a contingency table, use (row total × column total) / grand total instead.
  • Repeat for every category or cell. You need an expected frequency for each one. When you’re done, the sum of all expected frequencies should equal your grand total.

That last point is a useful check on your math. If your expected frequencies don’t add up to the same total as your observed frequencies, something went wrong in the calculation.

Minimum Expected Frequency Rule

One important practical detail: if any of your expected frequencies are too low, the chi-square test becomes unreliable. The traditional guideline is that every cell should have an expected frequency of at least 5. Some statisticians allow a small number of cells to dip slightly below that, but if you have cells with expected values of 1 or 2, the chi-square approximation breaks down. In those cases, you’d use an alternative like Fisher’s exact test, which doesn’t rely on the same large-sample assumption.

This rule only applies to expected frequencies, not observed ones. You could observe a count of zero in a cell and still run a valid chi-square test, as long as the expected count for that cell is large enough. The distinction matters because people sometimes confuse the two when checking whether their test is appropriate.