What Is a Chi-Square Statistic? Types and Formula

The chi-square statistic measures how far a set of observed counts falls from what you’d expect if nothing interesting were going on. It’s one of the most common tools in statistics for analyzing categorical data, things you can sort into groups (like yes/no, male/female, or red/blue/green) rather than measure on a scale. The core idea is simple: compare what actually happened to what should have happened under some assumption, then quantify the gap.

How the Formula Works

The chi-square statistic uses this logic for every category in your data: take the difference between what you observed and what you expected, square it, then divide by the expected value. Add those up across all categories, and you get a single number.

Written out: χ² = Σ (observed – expected)² / expected

Squaring the difference does two things. It makes all values positive (so negative and positive gaps don’t cancel each other out), and it gives extra weight to large deviations. Dividing by the expected frequency keeps things proportional. Being off by 10 counts matters a lot more when you only expected 20 than when you expected 2,000.

A chi-square value of zero would mean your data matched expectations perfectly. The larger the number, the bigger the mismatch between what you observed and what the hypothesis predicted. Whether that mismatch is large enough to be meaningful is where interpretation comes in.

Two Main Types of Chi-Square Tests

The same basic formula powers two different tests, each designed for a different question.

Goodness-of-Fit Test

This version asks whether a single variable follows a particular distribution. You have one categorical variable and a theory about how the counts should be spread across its categories. For example, if a die is fair, each face should come up roughly one-sixth of the time. Roll it 600 times and you’d expect about 100 of each outcome. The goodness-of-fit test compares your actual rolls to that expectation. The null hypothesis is straightforward: the data fits the proposed distribution. If the chi-square value is large enough, you reject that claim.

Degrees of freedom for a goodness-of-fit test equal the number of categories minus one. A six-sided die gives you 5 degrees of freedom.

Test of Independence

This is the more common version in research. It asks whether two categorical variables are related. You arrange your data in a contingency table, a grid where rows represent one variable and columns represent the other, with counts in each cell. The null hypothesis states that the two variables are independent, meaning knowing someone’s value on one variable tells you nothing about the other.

Say you’re looking at whether smoking status (smoker vs. non-smoker) is related to developing a certain condition (yes vs. no). You’d build a 2×2 table and calculate expected counts for each cell based on the row and column totals. If smokers develop the condition at a notably different rate than non-smokers, the chi-square value will be large.

Degrees of freedom for a test of independence equal (number of rows – 1) × (number of columns – 1). A 2×2 table has 1 degree of freedom. A 3×4 table has 6.

Interpreting the Result

Once you calculate a chi-square value, you compare it to a critical value from a chi-square distribution table. That critical value depends on two things: your degrees of freedom and the significance level you’ve chosen (usually 0.05, meaning you’re willing to accept a 5% chance of a false alarm).

If your calculated chi-square is larger than the critical value, you reject the null hypothesis. The data doesn’t fit the expected pattern, and the gap is too large to chalk up to random chance. If your chi-square is smaller than the critical value, you don’t have enough evidence to reject the null hypothesis. The observed differences could plausibly be due to chance alone.

Most software skips the table lookup and gives you a p-value directly. A p-value below 0.05 means the same thing as exceeding the critical value: the result is statistically significant at the 5% level.

What Chi-Square Doesn’t Tell You

A significant chi-square result tells you that a relationship exists, but it doesn’t tell you how strong it is. A massive sample can produce a statistically significant result even when the actual relationship is tiny. This is where effect size measures come in.

For a 2×2 table, the phi coefficient works like a correlation, ranging from 0 (no association) to 1 (perfect association). Squaring phi gives you the proportion of shared variance between the two variables, similar to how r-squared works in regression. For larger tables (anything beyond 2×2), Cramér’s V serves the same purpose. It adjusts for table size and also ranges from 0 to 1, with higher values indicating a stronger relationship between the two variables.

Reporting an effect size alongside your chi-square result is important because it separates “statistically detectable” from “practically meaningful.”

Assumptions You Need to Meet

The chi-square test has a few requirements that are easy to overlook.

Independent observations. Each data point must come from a separate, unrelated case. If your data involves paired or matched subjects (like parents and their children, or the same person measured twice), the standard chi-square test isn’t appropriate.
Sufficient expected counts. At least 80% of cells in your table should have an expected count of 5 or more, and no cell should have an expected count below 1. A practical rule of thumb: your total sample size should be at least 5 times the number of cells in the table.
Categorical data. Both variables need to be categorical. You can’t plug continuous measurements directly into a chi-square test without first grouping them into categories.

The expected count rule is the one that trips people up most often. Note that it’s the expected count that matters, not the observed count. You might observe a zero in one cell and still be fine, as long as the expected count for that cell is reasonable.

When to Use Fisher’s Exact Test Instead

The chi-square statistic relies on an approximation that works well with large samples but breaks down with small ones. When more than 20% of your cells have expected counts below 5, the approximation becomes unreliable. In those cases, Fisher’s exact test is the standard alternative. It calculates an exact probability rather than relying on the chi-square approximation, making it appropriate for small samples where the chi-square result would be misleading.

For 2×2 tables specifically, you may also encounter the Yates correction for continuity, which adjusts each observed value by 0.5 toward the expected value. It was designed to make the chi-square approximation more accurate with small samples, but research has shown it tends to be overly conservative, making it harder to detect real effects. Many statisticians now recommend skipping the Yates correction and using Fisher’s exact test when sample sizes are small.

A Simple Walkthrough

Imagine you surveyed 200 people about whether they prefer coffee or tea, and you also recorded whether they work morning or evening shifts. You want to know if shift preference is related to drink preference.

Your observed data might look like this: 70 morning-shift coffee drinkers, 30 morning-shift tea drinkers, 40 evening-shift coffee drinkers, and 60 evening-shift tea drinkers. You’d calculate expected counts for each cell using the row and column totals. For instance, if 110 total people drink coffee and 100 work morning shifts, the expected count for morning-coffee is (110 × 100) / 200 = 55.

You’d then plug each cell’s observed and expected values into the formula, sum them up, and get a chi-square value. With 1 degree of freedom (a 2×2 table), the critical value at the 0.05 level is 3.84. If your chi-square exceeds that, you’d conclude that shift timing and drink preference are related in your sample, not just a coincidence of random variation.

Running Chi-Square in Software

You’ll rarely calculate chi-square by hand outside of a classroom. Most statistical software and programming languages have built-in functions that return the chi-square statistic, the p-value, degrees of freedom, and the expected frequency table all at once. In Python’s SciPy library, for example, the chi2_contingency function takes your observed table and returns all four values. In R, the chisq.test function does the same. Excel, SPSS, and Google Sheets all offer similar tools.

These functions typically apply the Yates correction by default for 2×2 tables, so check the settings if you want the uncorrected version. The output will always include the p-value, which is the number most people look at first to decide whether the result is significant.