Chi-Square Test vs T-Test: When to Use Each

The choice between a chi-square test and a t-test comes down to one thing: what type of data you’re analyzing. Use a t-test when your outcome variable is numerical (like blood pressure, test scores, or height) and you want to compare the average between two groups. Use a chi-square test when both of your variables are categorical (like gender, treatment group, yes/no outcomes) and you want to know whether they’re related.

That’s the core rule, but applying it correctly means understanding what each test actually does, what assumptions they require, and what to do when your data doesn’t fit neatly into either box.

What Each Test Actually Measures

A t-test calculates the difference in group means divided by the pooled variability of those means. In plain terms, it asks: “Is the average in group A meaningfully different from the average in group B, or could this gap just be random noise?” You need a numerical measurement as your outcome. Think weight in kilograms, reaction time in milliseconds, or exam scores on a 100-point scale.

A chi-square test works with counts and categories. Instead of comparing averages, it compares the frequencies you observed against the frequencies you’d expect if there were no relationship between the variables. For example, if you surveyed 200 people about whether they prefer coffee or tea and also recorded whether they work morning or evening shifts, a chi-square test would tell you whether shift preference and drink preference are linked, or whether the pattern in your data could easily happen by chance.

The Decision Starts With Your Variables

Before choosing a test, identify two things: your independent variable (the grouping factor) and your dependent variable (the outcome you measured). The nature of the dependent variable drives the decision.

Numerical outcome, two groups: Use a t-test. Example: comparing average cholesterol levels between a drug group and a placebo group.
Categorical outcome, categorical grouping: Use a chi-square test. Example: comparing the proportion of patients who recovered (yes/no) across two treatment types.
Categorical outcome, no grouping variable: Use a chi-square goodness-of-fit test. Example: checking whether the distribution of blood types in your sample matches the known population distribution.

Notice that the independent variable is almost always categorical (it defines groups). The dependent variable is what matters. If you measured it on a scale, you likely need a t-test. If you counted people falling into categories, you likely need a chi-square.

Three Types of t-Tests

Not all t-tests are identical. The version you use depends on how your data were collected.

An independent samples t-test (also called unpaired) compares means from two separate groups. The people in group A are different from the people in group B. “Is the average height of basketball players different from the average height of soccer players?” is an independent samples question.

A paired t-test compares two measurements taken on the same individuals. If you measured patients’ pain scores before and after a treatment, those two measurements are linked to the same person, so you’d use a paired test. The pairing accounts for individual variation, which generally gives you more statistical power to detect a real effect.

A one-sample t-test compares a single group’s mean against a known or hypothetical value. If you wanted to know whether the average height of sixth graders in your school district differs from 4 feet, you’d measure a random sample and test that one mean against 4 feet.

Two Types of Chi-Square Tests

The chi-square test of independence is the more common version. It tests whether two categorical variables are associated. You arrange your data in a cross-tabulation (rows for one variable, columns for the other) and compare the observed cell counts to what you’d expect if the variables were completely unrelated.

The chi-square goodness-of-fit test handles a different question: does a single categorical variable follow an expected distribution? A classic example comes from genetics. If a theory predicts offspring should appear in a 9:3:3:1 ratio across four phenotypes, you can collect data and use a goodness-of-fit test to see whether your observed counts match that prediction. When the gap between observed and expected is large enough, you reject the theoretical distribution.

Assumptions for t-Tests

T-tests require several conditions to produce reliable results. Your data should be measured on an interval or ratio scale (meaning the numbers have consistent spacing and real zero points, like temperature in Celsius or weight in pounds). The observations should be randomly and independently sampled. The data in each group should follow a roughly normal distribution, and the two groups should have similar levels of spread (similar variances).

The normality requirement loosens as your sample grows. Once each group has about 30 observations, the sampling distribution of the mean approximates a normal curve regardless of the shape of the underlying data. This is the central limit theorem in action, and it’s why researchers with larger samples worry less about perfectly bell-shaped data.

With smaller samples, normality matters more. If your data are clearly skewed or contain outliers, a non-parametric alternative like the Mann-Whitney U test is often a better choice. The Mann-Whitney compares the rank order of values rather than the means, making it resistant to extreme data points and non-normal distributions. As a general principle, when you specifically care about comparing means, the t-test is preferred. When your data quality is questionable or the distribution is clearly non-normal, the Mann-Whitney tends to perform better.

Assumptions for Chi-Square Tests

Chi-square tests have their own requirements, and the most important one involves expected cell counts. At least 80% of the cells in your table should have an expected frequency of 5 or more, and no cell should have an expected frequency below 1. A practical shortcut: your total sample size should be at least the number of cells in your table multiplied by 5. For a 2×3 table (6 cells), that means you need at least 30 observations.

When your sample is too small and more than 20% of cells have expected counts below 5, the chi-square approximation breaks down. In that case, Fisher’s exact test is the standard alternative. It calculates exact probabilities rather than relying on an approximation, so it works reliably even with very small samples. Most statistical software will flag this for you or report Fisher’s exact test alongside the chi-square result automatically.

Common Mistakes in Choosing Between Them

The most frequent error is treating a numerical variable as categorical (or vice versa) and then picking the wrong test. If you measured blood pressure as an actual number, use a t-test. If you categorized patients as “hypertensive” or “not hypertensive,” that’s now categorical, and a chi-square test applies. The same underlying data can lead to different tests depending on how you recorded or coded it. In general, keeping numerical data as numbers preserves more information and gives your analysis more power.

Another common mistake is using a t-test when you have more than two groups. T-tests are designed for two-group comparisons. If you have three or more groups (say, comparing test scores across freshmen, sophomores, and juniors), you need an ANOVA instead. Similarly, chi-square tests handle two or more groups without issue, so they remain appropriate even when you have several categories.

A subtler error is ignoring the assumptions. Running a t-test on a tiny sample of highly skewed data, or running a chi-square test when half your cells have expected counts of 2, produces results you can’t trust. Always check assumptions first, then choose the test that fits.

Quick Reference by Scenario

Comparing average test scores between two classrooms: Independent samples t-test.
Comparing a patient’s weight before and after a program: Paired t-test.
Checking if your sample’s mean differs from a national average: One-sample t-test.
Testing whether smoking status is related to lung disease status: Chi-square test of independence.
Testing whether a die is fair: Chi-square goodness-of-fit test.
Comparing two groups with a small sample and skewed data: Mann-Whitney U test (non-parametric alternative to the t-test).
Comparing two categorical variables with very small cell counts: Fisher’s exact test (alternative to chi-square).

The underlying logic stays consistent across all of these scenarios. Identify whether your outcome is a number or a category, check whether your data meet the assumptions, and pick the test that matches. When in doubt, look at what you actually measured and what question you’re trying to answer. The data type will point you to the right test nearly every time.