How to Know When to Reject a Chi-Square Test

You reject the null hypothesis in a chi-square test when your calculated test statistic exceeds the critical value for your chosen significance level, or equivalently, when your p-value falls below your alpha threshold (typically 0.05). But “when to reject” has a second, equally important meaning: when the chi-square test itself is the wrong tool for your data. Both situations come up constantly in practice, and confusing them leads to unreliable results.

Rejecting the Null Hypothesis

The chi-square test compares what you observed in your data to what you’d expect if nothing interesting were happening (the null hypothesis). In a goodness-of-fit test, the null says your data follows a specific distribution. In a test of independence, the null says two categorical variables have no relationship. In both cases, the decision rule is the same: if the gap between observed and expected values is large enough, you reject the null.

“Large enough” is defined by two numbers working together: your significance level (alpha) and your degrees of freedom. Most researchers set alpha at 0.05, meaning they’ll accept a 5% chance of a false alarm. Degrees of freedom depend on the structure of your data. For a goodness-of-fit test, it’s the number of categories minus one. For a test of independence using a contingency table, it’s (number of rows minus one) multiplied by (number of columns minus one).

Once you have both numbers, you look up the critical value in a chi-square distribution table or let software calculate it. If your test statistic is greater than that critical value, you reject the null. If your software gives you a p-value directly, you reject whenever p is less than alpha. A chi-square statistic of 3.84 is the critical value for 1 degree of freedom at alpha = 0.05, so for a simple 2×2 table, any value above 3.84 means you reject the null.

When the Chi-Square Test Itself Is Wrong

A statistically significant result is meaningless if the test shouldn’t have been used in the first place. The chi-square test rests on several assumptions, and violating them doesn’t just weaken your results. It can make them flat-out wrong. Here are the situations where you should set the chi-square aside entirely.

Expected Cell Counts Are Too Low

The chi-square test uses an approximation that only works well with adequate sample sizes. The standard rule: no more than 20% of your cells should have expected frequencies below 5, and no cell should have an expected frequency below 1. When your data violates this, the approximation breaks down and your p-value becomes unreliable.

The fix depends on your table size. For a 2×2 table with small expected counts, Fisher’s exact test is the standard alternative. It calculates the exact probability rather than relying on an approximation, so it works at any sample size. For larger tables with sparse cells, you can sometimes combine categories to boost expected counts, though this only makes sense if the combined categories are meaningfully related.

Observations Aren’t Independent

This is the most commonly violated assumption and the hardest to spot. Each observation in your dataset must be independent of every other observation, and each subject can contribute to only one cell in your table. If the same person appears in multiple cells, or if subjects are naturally linked (siblings, spouses, repeated measurements on the same individual), the chi-square test produces misleading results.

Lack of independence changes the underlying sampling distribution, which means the critical values you’re comparing against are simply wrong for your data. Behavioral research runs into this problem frequently when, for example, the same animal is observed multiple times and each observation is counted as a separate data point. The resulting test statistic can be dramatically inflated or deflated depending on how the dependence plays out.

Your Data Is Paired or Repeated

A specific and common form of non-independence is paired data. If you measured the same group of people before and after a treatment, or matched patients in pairs, the chi-square test for independence doesn’t apply. The two samples aren’t independent of each other because they’re literally the same people (or matched people) measured twice.

McNemar’s test is designed exactly for this situation. It handles paired samples with a dichotomous outcome, making it the right choice for before-and-after designs or any study where each observation in one group has a direct partner in the other. Using a standard chi-square test on paired data ignores the pairing structure and can lead you to miss real effects or find spurious ones.

Your Data Isn’t Categorical

The chi-square test is built for categorical (nominal or ordinal) data, meaning counts of things falling into distinct groups. If your variables are continuous measurements like weight, blood pressure, or income, chi-square is not the right tool. You’d need a different approach entirely, such as a t-test, correlation, or regression, depending on what you’re trying to learn.

The 2×2 Table: Yates’s Correction

When you have a simple 2×2 contingency table, some textbooks recommend applying Yates’s continuity correction. This adjustment subtracts 0.5 from each absolute difference between observed and expected values before squaring, which slightly shrinks the test statistic. The logic is that you’re using a continuous distribution (the chi-square curve) to approximate a discrete reality (whole-number counts), and the correction accounts for this mismatch.

In practice, though, there’s broad consensus that Yates’s correction is too conservative. It over-corrects, making it harder to reach significance and increasing the chance you’ll miss a real effect. Many statisticians now recommend using the standard Pearson chi-square for 2×2 tables when expected counts are adequate, and switching to Fisher’s exact test when they aren’t, skipping Yates’s correction altogether.

After You Reject: Measuring Effect Size

Rejecting the null hypothesis tells you that a relationship exists, but it says nothing about how strong that relationship is. A massive sample can produce a statistically significant result for a trivially small association. This is why reporting an effect size alongside your chi-square result matters.

Cramér’s V is the most common effect size measure for chi-square tests. It ranges from 0 to 1, and the standard interpretation breaks down like this:

  • Below 0.10: negligible association
  • 0.10 to 0.20: weak association
  • 0.20 to 0.40: moderate association
  • 0.40 to 0.60: relatively strong association
  • 0.60 to 0.80: strong association
  • 0.80 to 1.00: very strong association

If you reject the null but Cramér’s V is 0.08, the relationship you found is statistically real but practically meaningless. Conversely, a V of 0.45 with a rejected null means you’ve found something worth paying attention to. Always pair your rejection decision with an effect size to give your result context.

Quick Decision Checklist

Before running a chi-square test, walk through these checks:

  • Are your variables categorical? If not, use a test designed for continuous data.
  • Are observations independent? If the same subject appears more than once, or subjects are naturally linked, chi-square results will be unreliable.
  • Are your samples paired or matched? Use McNemar’s test instead.
  • Do more than 20% of cells have expected counts below 5? Switch to Fisher’s exact test.
  • Does any cell have an expected count below 1? Fisher’s exact test is necessary.

If your data passes all five checks, run the chi-square test and compare your statistic to the critical value. If it exceeds the critical value (or your p-value falls below alpha), reject the null hypothesis. Then calculate Cramér’s V to see whether the relationship you found is large enough to care about.