When to Reject the Null Hypothesis: Chi-Square P-Value

You reject the null hypothesis in a chi-square test when your p-value is less than your chosen significance level, typically 0.05. If the p-value falls below that threshold, the data provides strong enough evidence that the relationship or difference you observed is unlikely to be due to chance alone.

The Core Decision Rule

Before running a chi-square test, you set a significance level (called alpha). The most common choice is 0.05, meaning you’re willing to accept a 5% chance of being wrong when you reject the null hypothesis. Some fields use a stricter threshold of 0.01, which allows only a 1% chance of error.

Once the test produces a p-value, the comparison is straightforward:

P-value < alpha (e.g., < 0.05): Reject the null hypothesis. The result is statistically significant.
P-value ≥ alpha (e.g., ≥ 0.05): Fail to reject the null hypothesis. You don’t have enough evidence to claim a relationship exists.

Notice the phrasing “fail to reject” rather than “accept.” A non-significant result doesn’t prove the null hypothesis is true. It simply means your data wasn’t strong enough to rule it out.

What the Null Hypothesis Actually Claims

In a chi-square test, the null hypothesis states that two categorical variables are independent, meaning there is no relationship between them. The alternative hypothesis states that a relationship does exist. For example, if you’re testing whether vaccination status is related to pneumonia rates, the null hypothesis says the two variables are unrelated and any differences in your data are just random noise.

This applies to both major types of chi-square tests. In a test of independence, you’re asking whether two variables in a contingency table are related. In a goodness-of-fit test, you’re asking whether observed frequencies match an expected distribution. Either way, the null hypothesis represents “no meaningful pattern,” and the p-value tells you how likely your data would be if that were true.

How the P-Value Is Calculated

The chi-square test works by comparing what you observed in your data to what you’d expect if the null hypothesis were true. The formula produces a single number, the chi-square statistic, which captures how far your observed counts deviate from the expected counts. Larger deviations produce a larger statistic.

That statistic is then evaluated against the chi-square distribution, which accounts for the degrees of freedom in your data. Degrees of freedom depend on the size of your table. A simple 2×2 table (like vaccinated vs. unvaccinated crossed with sick vs. healthy) has 1 degree of freedom, calculated as (rows minus 1) times (columns minus 1). A 3×3 table has 4 degrees of freedom, and so on.

This matters because the same chi-square statistic can yield very different p-values depending on degrees of freedom. A chi-square value of 12.35 with 1 degree of freedom gives a p-value below 0.001, which is highly significant. That same value of 12.35 with 12 degrees of freedom gives a p-value above 0.10, which is not significant at all. The degrees of freedom shift the threshold for what counts as an unusual result.

Using Critical Values Instead of P-Values

There’s an equivalent way to make the same decision without computing a p-value directly. You can compare your calculated chi-square statistic against a critical value from a chi-square distribution table. If your test statistic exceeds the critical value for your chosen significance level and degrees of freedom, you reject the null hypothesis.

For example, at alpha = 0.05 with 1 degree of freedom, the critical value is 3.841. Any chi-square statistic above 3.841 leads to rejection. This approach and the p-value approach always give the same answer. Software typically reports p-values, which is why most people use that method, but critical value tables are useful when you’re working problems by hand or want to quickly check a result.

A Worked Example

Suppose researchers want to know if a pneumonia vaccine actually reduces infection rates. They collect data on two groups (vaccinated and unvaccinated) and record who developed pneumococcal pneumonia. This creates a 2×2 table with 1 degree of freedom.

After running the chi-square test, they get a p-value below 0.05. Following the decision rule, they reject the null hypothesis and conclude that there is a statistically significant difference in pneumonia rates between the vaccinated and unvaccinated groups. The vaccine appears to have an effect.

If the p-value had come back at, say, 0.23, they would fail to reject the null. That wouldn’t prove the vaccine is useless. It would mean their study didn’t produce strong enough evidence to confirm a difference, possibly because the sample was too small or the effect was too subtle to detect.

When Your Results Might Not Be Valid

A p-value from a chi-square test is only trustworthy when certain conditions are met. The most important rule involves expected cell counts. Each cell in your table has an expected frequency (what you’d predict under the null hypothesis), and the general guideline is that every expected count should be at least 5. When expected counts drop below this threshold, the chi-square approximation becomes unreliable and your p-value may be inaccurate. For small samples, Fisher’s exact test is the standard alternative.

The chi-square test also assumes that observations are independent, meaning each person or item in your dataset contributes to only one cell in the table. If the same individual appears in multiple categories, the test’s assumptions break down.

Statistical Significance vs. Practical Importance

Rejecting the null hypothesis tells you a relationship exists, but it doesn’t tell you how strong that relationship is. With a large enough sample, even a trivially small difference can produce a p-value below 0.05. This is why effect size measures matter.

Cramér’s V is the most common effect size for chi-square tests. It ranges from 0 to 1, and the interpretation breaks down like this:

0.2 or below: Weak association. The variables are technically related but the connection is minimal.
Between 0.2 and 0.6: Moderate association. There’s a meaningful relationship worth paying attention to.
Above 0.6: Strong association. The two variables are closely linked.

A study might reject the null hypothesis with p = 0.03 but show a Cramér’s V of 0.08. That result is statistically significant but practically meaningless. Always look at effect size alongside your p-value to understand whether a rejected null hypothesis reflects a real-world difference that actually matters.

The Risk of Being Wrong

Every time you reject a null hypothesis, there’s a chance you’re making a mistake. This is called a Type I error: concluding that a relationship exists when it actually doesn’t. Your significance level directly controls this risk. Setting alpha at 0.05 means you’ll make this error about 5% of the time across repeated tests. Setting it at 0.01 cuts that risk to 1%, but makes it harder to detect real effects.

If you’re running multiple chi-square tests on the same dataset, the cumulative risk of a Type I error increases with each test. Ten tests at alpha = 0.05 gives you roughly a 40% chance of at least one false positive. Corrections like the Bonferroni method address this by dividing your alpha by the number of tests, tightening the threshold for each individual comparison.