The chi-square test tells you whether there’s a meaningful relationship between two categorical variables, or whether your data matches an expected pattern. It works by comparing what you actually observed in your data to what you’d expect to see if nothing interesting were going on. If the gap between observed and expected is large enough, you can conclude the pattern is real and not just random chance.
There are a few versions of the test, but they all use the same core formula and logic. Here’s how to choose the right one, set it up, calculate it, and interpret the result.
Choosing the Right Chi-Square Test
Before you calculate anything, you need to pick which version fits your question. There are three main types, and each answers a different kind of question.
- Goodness-of-fit test: You have one categorical variable and want to know if the distribution of your data matches some expected pattern. For example, you roll a die 600 times and want to know if each number comes up roughly 100 times, as you’d expect from a fair die. You’re comparing one set of observed counts against a theoretical distribution.
- Test of independence: You have two categorical variables measured in the same group and want to know if they’re related. For example, you survey 500 people and record both their smoking status (yes/no) and whether they developed a certain condition (yes/no). You build a contingency table and test whether the two variables are linked or independent of each other.
- Test of homogeneity: You have one categorical variable measured across two or more separate populations and want to know if the distribution is the same across groups. For example, you ask customers at three different stores to rate their satisfaction as “satisfied” or “unsatisfied” and test whether the satisfaction distribution differs between stores.
The independence and homogeneity tests use the same math and the same contingency table setup. The difference is in the study design: independence draws one sample and measures two variables, while homogeneity draws separate samples and measures one variable in each.
Requirements Your Data Must Meet
Chi-square tests have several assumptions that need to hold for the result to be valid.
Your data must be raw counts, not percentages or proportions. Each observation should fall into one and only one category, so the categories need to be mutually exclusive. The observations must also be independent of each other: one person’s response can’t influence another’s, and the same person can’t appear in more than one cell of your table. If you’re testing the same subjects at multiple time points, chi-square is not the right tool.
Both variables should be categorical, measured at the nominal level (like “male/female” or “yes/no”). Ordinal categories work too, and you can collapse continuous data into categories if needed.
There’s also a minimum sample size guideline: each cell in your table should have an expected frequency of at least 5. If more than 20% of your cells have expected values below 5, the chi-square approximation becomes unreliable. In that situation, you can combine adjacent categories to boost the numbers, or switch to Fisher’s exact test, which handles small samples well.
Setting Up Your Contingency Table
For a test of independence or homogeneity, organize your data into a contingency table where rows represent one variable and columns represent the other. Each cell contains the count of observations that fall into that combination.
Here’s a concrete example from a clinical trial published in the BMJ. Researchers compared a new depression drug against a standard drug. Of 73 patients on the new drug, 56% showed improvement. Of 70 patients on the standard drug, 41% showed improvement. You’d set up a 2×2 table with rows for “new drug” and “standard drug,” columns for “improved” and “did not improve,” and fill in the counts: roughly 41 and 32 in the first row, 29 and 41 in the second.
For a goodness-of-fit test, you just need a single row of observed counts and a matching row of expected counts based on your theoretical distribution.
Calculating Expected Frequencies
The expected frequency is what you’d predict for each cell if there were no relationship between the variables.
For a goodness-of-fit test, expected frequencies are straightforward: multiply the total sample size by the expected probability for each category. If you expect a fair die to land on each number equally across 600 rolls, the expected frequency for each number is 600 × (1/6) = 100.
For a contingency table, the expected frequency for each cell is calculated as: (row total × column total) / grand total. This gives you the count you’d expect in that cell if the two variables were completely unrelated. Do this for every cell in the table.
The Chi-Square Formula
Once you have observed (O) and expected (E) values for every cell, the test statistic is:
χ² = Σ (O − E)² / E
In plain terms: for each cell, subtract the expected count from the observed count, square that difference, and divide by the expected count. Then add up all those values across every cell. The result is your chi-square statistic. A larger number means a bigger gap between what you observed and what you’d expect under no relationship.
Walk through it cell by cell. If you observed 41 patients improving on the new drug but expected 35, you’d calculate (41 − 35)² / 35 = 1.03 for that cell. Repeat for the remaining cells and sum them up.
Calculating Degrees of Freedom
Degrees of freedom determine which chi-square distribution you compare your result against. The formula depends on which test you’re running.
For a contingency table (independence or homogeneity): degrees of freedom = (number of rows − 1) × (number of columns − 1). A 2×2 table has (2−1)(2−1) = 1 degree of freedom. A 3×4 table has (3−1)(4−1) = 6.
For a goodness-of-fit test: degrees of freedom = number of categories − 1. A die with 6 outcomes gives you 5 degrees of freedom.
Interpreting the Result
Compare your chi-square statistic to a critical value from a chi-square distribution table, using your degrees of freedom and your chosen significance level (usually 0.05). If your statistic is larger than the critical value, you reject the null hypothesis, meaning the relationship or difference you observed is statistically significant.
Most software will give you a p-value directly instead of requiring a table lookup. If the p-value is less than 0.05, the result is significant. A p-value of 0.03, for instance, means there’s only a 3% chance you’d see a gap this large between observed and expected counts if there were truly no relationship.
Keep in mind that statistical significance doesn’t tell you the size of the effect. A very large sample can produce a significant chi-square result even when the actual relationship between variables is weak.
Measuring Effect Size
To understand how strong the association is, calculate an effect size alongside your chi-square result.
For 2×2 tables, use the Phi coefficient. It ranges from 0 (no association) to 1 (perfect association) and is derived directly from the chi-square statistic and your sample size. For larger tables, use Cramér’s V, which also ranges from 0 to 1 and adjusts for the dimensions of the table. Values above 0.5 generally indicate a meaningful, strong association. Values below 0.1 suggest the relationship, even if statistically significant, is trivially small in practical terms.
Reporting Your Results
When writing up chi-square results, the standard format includes the test statistic, degrees of freedom, sample size, and p-value. A typical write-up looks like this:
χ²(1, N = 143) = 3.21, p = .04
The number in parentheses is the degrees of freedom, N is your total sample size, and the p-value is reported to exact decimal places rather than just “less than .05.” Include the frequencies for each category or cell so readers can see the raw data behind the number. If you calculated an effect size, report that too.
When to Use an Alternative Test
Chi-square doesn’t work well in every situation. For 2×2 tables with a total sample size under about 40, or any table where expected cell counts fall below 5, Fisher’s exact test is the better choice. It calculates the exact probability rather than relying on the approximation that chi-square uses, and modern software can run it on any sample size.
You may also see Yates’ continuity correction mentioned for 2×2 tables. This adjusts the formula by subtracting 0.5 from each absolute difference before squaring, making the test slightly more conservative. It was historically used for small samples, but Fisher’s exact test has largely replaced it as the preferred solution. For larger samples, the correction makes almost no difference and can be skipped.
If your data involves paired or matched observations (like testing the same people before and after a treatment), chi-square’s independence assumption is violated. McNemar’s test is designed for that situation instead.

