How to Find Expected Frequency in Chi-Square Tests

Expected frequency in a chi-square test is calculated differently depending on which type of chi-square test you’re running. For a test of independence using a contingency table, you multiply the row total by the column total for each cell, then divide by the grand total. For a goodness-of-fit test, you multiply the total number of observations by the expected proportion for each category. Both methods give you the frequencies you’d expect to see if there were no real effect, which you then compare against your observed (actual) data.

The Formula for Contingency Tables

When you have a two-way table comparing two categorical variables, the expected frequency for any cell follows this formula:

Expected frequency = (Row total × Column total) / Grand total

You apply this to every cell in the table. For example, say you have a 2×2 table with a grand total of 1,092 participants. Row 1 has 85 people, and Column 1 has 218 people. The expected frequency for the cell where Row 1 and Column 1 intersect is (85 × 218) / 1,092 = 16.97. You’d repeat this calculation for every remaining cell using the corresponding row and column totals each time.

The logic behind this formula is straightforward: if the two variables are truly independent of each other (no relationship between them), then the proportion of observations in any cell should reflect the overlap of its row and column proportions. The expected frequency is what that “no relationship” scenario looks like in numbers.

The Formula for Goodness-of-Fit Tests

A goodness-of-fit test works differently because you’re not comparing two variables against each other. Instead, you’re checking whether a single variable’s distribution matches some expected pattern, like equal proportions or a known probability.

Here, the expected frequency for each category is simply:

Expected frequency = Total observations × Expected proportion

If you flip a coin 87 times and want to test whether it’s fair, you expect heads 50% of the time. So the expected frequency for heads is 0.5 × 87 = 43.5, and the same for tails. If you were testing whether a die is fair across six sides, each expected frequency would be (1/6) × total rolls.

The expected proportions don’t have to be equal. If a genetic model predicts a 3:1 ratio of two traits, you’d use 0.75 and 0.25 as your proportions and multiply each by the total number of organisms observed.

A Worked Example

Suppose a researcher surveys 200 people about whether they prefer coffee or tea, split by age group (under 40 and 40+). The observed data looks like this:

Under 40, Coffee: 70 observed
Under 40, Tea: 50 observed
40+, Coffee: 40 observed
40+, Tea: 40 observed

Row totals: Under 40 = 120, 40+ = 80. Column totals: Coffee = 110, Tea = 90. Grand total = 200.

Now calculate each expected frequency:

Under 40, Coffee: (120 × 110) / 200 = 66
Under 40, Tea: (120 × 90) / 200 = 54
40+, Coffee: (80 × 110) / 200 = 44
40+, Tea: (80 × 90) / 200 = 36

Notice the expected frequencies add up to the same row and column totals as the observed data. That’s a good way to check your math. Once you have both observed and expected values, you plug them into the chi-square formula, which sums up (observed – expected)² / expected across all cells.

What the Expected Frequencies Tell You

Expected frequencies represent what your data would look like if there were no meaningful pattern, no association between variables, or no deviation from the hypothesized distribution. The chi-square statistic measures how far your observed data strays from that baseline. Small differences between observed and expected values produce a small chi-square statistic, suggesting your data is consistent with the “no effect” scenario. Large differences produce a large statistic and a small p-value, which is evidence that your observed data doesn’t fit the expected pattern.

A p-value below your significance threshold (commonly 0.05) means the observed counts are unlikely to have come from the distribution you expected. A p-value above that threshold means there’s no strong evidence of a difference.

Minimum Expected Frequency Rules

Chi-square tests have an important assumption about how large your expected frequencies need to be. The standard guideline is that at least 80% of cells should have an expected frequency of 5 or more, and no cell should have an expected frequency below 1. When expected values are too small, the chi-square approximation becomes unreliable and can produce misleading results.

If your data violates these thresholds, you have a few options. For 2×2 tables with a total sample size under about 40, a continuity correction can be applied. This adjustment subtracts 0.5 from the absolute difference between observed and expected values in each cell before squaring, producing a slightly more conservative result. It’s only appropriate for tables with one degree of freedom (2×2 tables) and makes little difference with larger sample sizes.

When more than 20% of your cells have expected frequencies below 5, the better option is to switch to Fisher’s exact test. This test calculates exact probabilities rather than relying on the approximation that chi-square uses, making it valid even with very small samples. Most statistical software will flag this for you or run Fisher’s exact test automatically when expected frequencies are too low.

Common Mistakes to Avoid

The most frequent error is using observed frequencies where expected frequencies belong. The row and column totals you use in the formula come from the actual data, but the cell values you’re calculating are theoretical. Don’t confuse the two when plugging numbers into the chi-square formula.

Another common mistake is rounding expected frequencies to whole numbers. Expected values of 16.97 or 43.5 are perfectly valid. Keep the decimals through your calculations for accuracy, even though you’d never observe half a person in real life. These are theoretical values, not counts.

Finally, make sure you’re using the right method for your test type. If you’re working with a contingency table, use row total × column total / grand total. If you’re testing against a known distribution, use total × proportion. Mixing up the two approaches will give you the wrong expected values and an incorrect chi-square result.