How to Calculate Expected Counts for Chi-Square Tests

Expected counts in a chi-square test are calculated by multiplying the row total by the column total and dividing by the grand total for each cell. The formula is: Expected count = (Row total × Column total) / Grand total. This applies to the chi-square test of independence, which is the most common version. A slightly different approach applies to goodness-of-fit tests, where you multiply the total sample size by each category’s theoretical probability.

The Formula for a Contingency Table

When you have a contingency table (rows and columns of observed counts), the expected count for any single cell uses three numbers: the total for that cell’s row, the total for that cell’s column, and the grand total of all observations. You plug them in like this:

Expected count = (Row total × Column total) / Grand total

The logic behind this formula is straightforward. Expected counts represent what the data would look like if there were no relationship between the two variables. They distribute the totals proportionally across every cell, so each cell gets exactly its “fair share” based on how common its row category and column category are in the overall sample.

Step-by-Step Example With Real Numbers

Suppose you have survey data on whether people voted in the last election, broken down by age group. Your observed counts look like this:

  • Youngest (18–24), Did not vote: 35
  • Youngest (18–24), Voted: 50
  • Older ages (25+), Did not vote: 183
  • Older ages (25+), Voted: 824

The row totals are 85 (youngest) and 1,007 (older). The column totals are 218 (did not vote) and 874 (voted). The grand total is 1,092.

Now calculate the expected count for each cell. For the youngest group who did not vote: (85 × 218) / 1,092 = 16.97. For the youngest group who voted: (85 × 874) / 1,092 = 68.03. For the older group who did not vote: (1,007 × 218) / 1,092 = 201.03. For the older group who voted: (1,007 × 874) / 1,092 = 805.97.

Notice that the expected counts don’t need to be whole numbers. They rarely are. Also notice that the expected counts in each row still add up to the row total, and each column still adds up to the column total. That’s a useful way to check your math.

Expected Counts for Goodness-of-Fit Tests

The goodness-of-fit test works differently because you’re not comparing two variables against each other. Instead, you’re checking whether a single variable matches a specific distribution you’d expect in theory. The formula simplifies to:

Expected count = n × p

Here, n is your total sample size and p is the probability you’d expect for that category. For example, if you roll a die 500 times and want to test whether it’s fair, each side has an expected probability of 1/6. So the expected count for each side is 500 × (1/6) = 83.33. You’d compare your observed counts for each side against 83.33 to see if the die is biased.

The probabilities don’t have to be equal. If you’re testing whether birth months in a sample match the national distribution, each month would have a different expected probability based on real birth rate data.

How Expected Counts Feed Into the Test Statistic

Once you have expected counts, you use them alongside your observed counts to calculate the chi-square statistic. For each cell, you subtract the expected count from the observed count, square the result, and divide by the expected count. Then you add up all those values across every cell.

Chi-square statistic = sum of (Observed − Expected)² / Expected

Cells where observed and expected counts are close contribute very little to the statistic. Cells where they differ sharply push the statistic higher, making it more likely you’ll find a statistically significant result. This is why getting the expected counts right matters: they’re the baseline your entire test is measured against.

The Minimum Expected Count Rule

Chi-square tests have an important assumption about expected counts. At least 80% of your cells should have expected counts of 5 or more, and no cell should have an expected count below 1. A practical guideline: your total sample size should be at least the number of cells multiplied by 5.

When too many cells fall below 5, the chi-square approximation becomes unreliable and can produce misleading results. You might get a significant result when the true relationship isn’t there (a false positive), or miss a real relationship entirely (a false negative). For 2×2 tables specifically, the test is inappropriate if the total sample is less than 20, or if the total is between 20 and 40 and the smallest expected count is under 5.

If your expected counts are too small, you have a couple of options. For 2×2 tables, Fisher’s Exact Test is the standard alternative. For larger tables, a likelihood ratio chi-square test handles small expected counts better. You can also try combining categories that are conceptually similar to increase cell sizes.

Yates’ Correction for Small Samples

For 2×2 tables with modest sample sizes, some analysts apply Yates’ continuity correction, which slightly adjusts the chi-square calculation to be more conservative. A common practice is to use it when the total sample is under 100 or any cell has a count below 10. The correction subtracts 0.5 from the absolute difference between observed and expected counts before squaring, which reduces the chi-square value slightly and makes it harder to reach significance. Not everyone agrees it’s necessary, and many statisticians prefer switching to Fisher’s Exact Test instead.

Calculating Expected Counts in Excel

You can calculate expected counts in any spreadsheet without special software. Start by entering your observed counts in a table with row and column totals. Then create a second, identical table layout for expected values. In each cell of the expected table, enter a formula that multiplies the corresponding row total by the corresponding column total, then divides by the grand total. Use absolute cell references (the dollar sign notation) for the grand total so it doesn’t shift when you copy the formula across cells.

For example, if your row total is in cell E2, your column total is in cell B5, and your grand total is in cell E5, the formula would be: =E2*B5/$E$5. Copy that pattern for every cell in the table, adjusting the row and column references accordingly.

Excel also has a built-in CHISQ.TEST function that takes your observed range and expected range and returns a p-value directly. But you still need to calculate the expected counts yourself first and enter them in a separate table before using it.

Degrees of Freedom

After you calculate your chi-square statistic, you need degrees of freedom to look up the p-value. For a test of independence, degrees of freedom equal (number of rows − 1) × (number of columns − 1). A 2×2 table has 1 degree of freedom. A 3×4 table has 6. For a goodness-of-fit test, degrees of freedom equal the number of categories minus 1. A six-sided die test has 5 degrees of freedom.