When to Reject the Null Hypothesis in Chi-Square

You reject the null hypothesis in a chi-square test when your calculated chi-square statistic is larger than the critical value for your degrees of freedom and chosen significance level. Equivalently, you reject it when the p-value from your test is smaller than your alpha level, which is conventionally set at 0.05. Both methods give you the same answer; they’re just two ways of expressing the same decision rule.

The Two Ways to Make the Decision

There are two routes to the same conclusion, and most software gives you both.

Comparing the test statistic to a critical value. You calculate a chi-square statistic from your data, then look up the critical value in a chi-square distribution table using your degrees of freedom and alpha level. If your statistic exceeds that threshold, you reject the null hypothesis. At alpha = 0.05, the critical values for the most common degrees of freedom are:

1 degree of freedom: 3.84
2 degrees of freedom: 5.99
3 degrees of freedom: 7.82
4 degrees of freedom: 9.49

If your chi-square statistic is, say, 6.2 with 1 degree of freedom, that’s above 3.84, so you reject the null hypothesis.

Comparing the p-value to alpha. The p-value tells you the probability of seeing results as extreme as yours (or more extreme) if the null hypothesis were actually true. When that probability drops below your alpha level (typically 0.05), you reject the null. A p-value of 0.03 means there’s only a 3% chance the data would look this way under the null hypothesis, which falls below the 5% threshold.

What the Null Hypothesis Actually Claims

In a chi-square test of independence, the null hypothesis states that two categorical variables are not associated. For example, if you’re testing whether smoking status is related to lung disease, the null says the two variables are independent of each other. Rejecting it means you have enough evidence to conclude an association exists.

In a chi-square goodness-of-fit test, the null hypothesis states that your observed data follow a specific expected distribution. Rejecting it means the data deviate from that expected pattern more than random chance would explain.

How to Calculate Degrees of Freedom

Your degrees of freedom determine which critical value you use, so getting this right matters. For a contingency table (test of independence), degrees of freedom equal the number of rows minus one, multiplied by the number of columns minus one. A 2×2 table has (2-1) × (2-1) = 1 degree of freedom. A 3×4 table has (3-1) × (4-1) = 6.

For a goodness-of-fit test, degrees of freedom equal the number of categories minus one, minus any additional parameters you estimated from the data.

Assumptions That Must Hold

A chi-square test result is only trustworthy if the underlying assumptions are met. The most important one involves expected cell counts: at least 80% of cells in your table should have an expected frequency of 5 or more, and no cell should have an expected frequency below 1. “Expected frequency” here doesn’t mean the numbers you actually observed. It means the counts you’d predict in each cell if the null hypothesis were true.

When your sample is too small to meet these thresholds, the chi-square approximation breaks down and can give misleading results. In that situation, Fisher’s exact test is the standard alternative for 2×2 tables, because it calculates exact probabilities without relying on the approximation.

What “Reject” and “Fail to Reject” Actually Mean

Rejecting the null hypothesis doesn’t prove your alternative hypothesis is true. It means the data are incompatible enough with the null that you’re willing to act as though the null is false. There’s always a chance you’re wrong.

A Type I error happens when you reject a null hypothesis that’s actually true: you conclude there’s an association when there isn’t one. Your alpha level (0.05) is the maximum probability you’re accepting for this kind of mistake. Setting alpha at 0.05 means you’re allowing up to a 5% chance of a false positive.

A Type II error goes the other direction. You fail to reject the null hypothesis even though it’s actually false, missing a real association. The probability of this error is called beta, and it’s influenced by your sample size and the strength of the actual effect. Larger samples reduce the risk of Type II errors because they give the test more power to detect real differences.

Note the careful language: you “fail to reject” the null rather than “accept” it. Failing to reject simply means you didn’t find enough evidence. It doesn’t confirm the null is true.

Comparing More Than Two Groups

When your contingency table has more than two rows or columns, a significant chi-square result tells you that the variables are associated somewhere in the table, but it doesn’t tell you which specific groups differ. To find that out, you break the table into smaller 2×2 comparisons and test each one separately.

This creates a multiple comparisons problem: every additional test increases your chance of a false positive. The standard fix is a Bonferroni correction, where you divide your alpha level by the number of comparisons. If you’re making three pairwise comparisons, your corrected alpha becomes 0.05 / 3 = 0.017. You only reject the null for a specific pair if that comparison’s p-value falls below 0.017.

Statistical Significance vs. Practical Importance

A statistically significant chi-square result tells you an association exists, but it says nothing about how strong that association is. With a large enough sample, even a trivially small difference between groups can produce a significant p-value. That’s why reporting an effect size alongside your chi-square result matters.

The most common effect size measure for chi-square tests is Cramér’s V, which ranges from 0 to 1. The general interpretation:

0.2 or below: Weak association. The variables are related, but barely.
Between 0.2 and 0.6: Moderate association.
Above 0.6: Strong association.

A chi-square test might return a p-value of 0.001, which sounds impressive, but if Cramér’s V is 0.08, the relationship is too weak to matter in practice. Always check both numbers before drawing conclusions about how meaningful your finding is.