When to Use a Chi-Square Test (and When Not To)

You use a chi-square test when you want to analyze categorical data, meaning variables that fall into distinct groups or categories rather than existing on a numerical scale. If you’re comparing things like yes/no responses, treatment groups, survey ratings, or demographic categories, and you want to know whether the patterns you see are statistically meaningful or just due to chance, the chi-square test is likely the right tool.

The Core Requirement: Categorical Variables

The chi-square test works exclusively with categorical variables. These are variables where data falls into named groups: male or female, smoker or nonsmoker, satisfied or dissatisfied, blood type A, B, AB, or O. Each variable needs at least two categories. The test cannot handle continuous variables like weight, blood pressure, or income measured in dollars. If your data involves measurements on a scale, you need a different test entirely (like a t-test or ANOVA).

This distinction trips people up more than anything else. If you’re comparing average test scores between two groups, that’s not a chi-square situation. But if you’re comparing the proportion of students who passed versus failed across two teaching methods, now you’re in chi-square territory. The key question: are you counting how many observations fall into each category, or are you measuring something?

Two Types of Chi-Square Tests

There are two main versions, and knowing which one you need depends on your research question.

The goodness-of-fit test involves a single categorical variable. You use it when you want to know whether the distribution of your observed data matches some expected distribution. For example, you might test whether the colors of cars in a parking lot follow the national distribution of car colors, or whether a die produces each number equally often. You’re comparing what you observed against what you expected to see.

The test of independence involves two categorical variables and asks whether they’re related. This is the more common version in research. You arrange your data in a contingency table (a grid where rows represent one variable and columns represent the other) and test whether the two variables are associated. For instance: is there a relationship between smoking status and lung disease? Is customer satisfaction related to which store location people visited? The test tells you whether the observed pattern differs from what you’d expect if the two variables were completely unrelated.

Three Assumptions That Must Be Met

Before running a chi-square test, your data needs to satisfy three conditions. Violating any of them can produce misleading results.

Independence of observations. Each data point must come from a separate, unrelated individual or case. A single subject can only contribute to one cell in your table. If you’re testing the same people at multiple time points, or your data consists of paired samples like parents matched with their children, the chi-square test is not appropriate. Those designs require tests built for related samples, like McNemar’s test.

Adequate sample size. The chi-square test relies on an approximation that breaks down with small numbers. The standard rule: no more than 20% of your cells should have expected frequencies below 5, and no cell should have an expected frequency below 1. For a simple 2×2 table, the test is inappropriate if your total sample is less than 20, or if the total falls between 20 and 40 and the smallest expected value is less than 5. Notice this refers to expected values (what you’d predict if there were no relationship), not the numbers you actually observed.

Mutually exclusive categories. Each observation fits into one and only one combination of categories. A person can’t be counted in both the “satisfied” and “dissatisfied” column, or appear in multiple rows simultaneously.

When to Use Fisher’s Exact Test Instead

When your sample is too small to meet the expected frequency requirements, Fisher’s exact test is the standard alternative for 2×2 tables. It calculates exact probabilities rather than relying on the approximation that the chi-square test uses, so it remains valid even with very small cell counts. If more than 20% of your cells have expected frequencies below 5, switch to Fisher’s exact test. Many statistical software packages will flag this for you automatically or offer both results side by side.

Real-World Examples

Chi-square tests show up constantly in medical, social science, and business research. A clinical trial might compare the proportion of patients who recovered versus didn’t recover across a drug group and a placebo group. A marketing team might test whether product preference differs by age bracket. A public health researcher might examine whether vaccination status is associated with infection rates across several regions.

In each case, the underlying logic is the same: you have counts of people or events sorted into categories, and you want to know if the pattern is real or could have happened by chance. The test produces a p-value that tells you the probability of seeing your results (or something more extreme) if there were truly no relationship between the variables. A p-value below 0.05 is the conventional threshold for calling the result statistically significant.

What Chi-Square Doesn’t Tell You

A significant chi-square result tells you that a relationship exists between your variables, but it doesn’t tell you how strong that relationship is. For that, you need an effect size measure. Cramér’s V is the most common one used alongside chi-square tests. It ranges from 0 to 1, where values between 0.05 and 0.15 indicate a small effect, 0.15 to 0.25 a medium effect, and above 0.25 a large effect (these benchmarks shift slightly depending on your table size).

The test also doesn’t tell you the direction of the relationship or which specific cells are driving the result. In a large table with many categories, a significant chi-square means something is going on somewhere in the table, but you’ll need to examine the individual cells, often through standardized residuals, to figure out where the interesting differences actually are.

Reporting Your Results

When writing up chi-square results in a paper, you’ll typically report the chi-square value, degrees of freedom, sample size, and p-value. Degrees of freedom for a test of independence equal the number of rows minus one, multiplied by the number of columns minus one. For a goodness-of-fit test, it’s the number of categories minus one. Report p-values to two or three decimal places; when p is smaller than .001, simply write p < .001 rather than listing a string of zeros. Including an effect size like Cramér’s V alongside your chi-square result gives readers a much more complete picture than the p-value alone.

Quick Decision Guide

Use a chi-square test when all of these are true:

Your variables are categorical, not continuous measurements
Your observations are independent, with no repeated measures or paired data
Your sample is large enough that expected cell frequencies stay above 5 in at least 80% of cells, with none below 1
You want to know whether observed counts differ from expected counts (goodness of fit) or whether two categorical variables are related (independence)

If your data involves continuous outcomes, use a t-test or ANOVA. If your sample is too small, use Fisher’s exact test. If your observations are paired or repeated, use McNemar’s test. The chi-square test fills a specific and very common niche: testing relationships between categorical variables with adequate sample sizes and independent observations.