How to Read a Contingency Table and Interpret It

A contingency table displays how two categorical variables relate to each other by organizing counts into rows and columns. If you’ve encountered one in a textbook, a research paper, or a statistics class and weren’t sure what you were looking at, this guide walks you through every part of the table and shows you how to extract meaning from it.

What a Contingency Table Actually Shows

A contingency table (also called a crosstab or two-way frequency table) summarizes the relationship between two categorical variables. “Categorical” just means the data falls into groups rather than being measured on a scale. Think: male/female, yes/no, treated/untreated, smoker/nonsmoker.

Each cell in the table holds a count of how many observations fall into that particular combination of categories. For example, if the rows represent “had an accident” vs. “no accident” and the columns represent “DUI” vs. “not DUI,” one cell would tell you how many drivers were DUI and had an accident. The entire table lets you see, at a glance, whether the two variables seem connected.

The Anatomy of the Table

Every contingency table has the same basic parts. Understanding them is the first step to reading one correctly.

Row variable: One categorical variable runs along the left side. Each row represents one category of that variable.

Column variable: The second categorical variable runs across the top. Each column represents one of its categories.

Interior cells: The numbers inside the table are the observed frequencies. Each cell tells you how many observations matched that specific row-and-column combination.

Row totals (marginals): The rightmost column sums up all the cells in each row. These tell you the total count for each row category, regardless of the column variable.

Column totals (marginals): The bottom row sums up all the cells in each column. Same idea, but for the column variable.

Grand total: The number in the bottom-right corner is the total number of observations in the entire table. It should equal the sum of all row totals or all column totals.

The table’s size is described as R × C, where R is the number of rows and C is the number of columns (not counting the totals). A simple 2×2 table has two row categories and two column categories, creating four interior cells. Larger tables like 3×4 or 5×2 work the same way, just with more cells.

A Step-by-Step Approach to Reading One

When you sit down with a contingency table for the first time, work through it in this order.

Step 1: Identify the two variables. Read the row and column labels. Make sure you understand what each category means before looking at any numbers. In a table about drug testing, for instance, the rows might be “tested positive” and “tested negative,” while the columns are “actually uses drugs” and “does not use drugs.”

Step 2: Read the marginal totals. Look at the row totals on the right and the column totals on the bottom. These give you the overall distribution of each variable on its own. If 200 out of 1,000 drivers were DUI, you already know something useful before you look at any interior cell.

Step 3: Examine individual cells. Now look at the interior counts. Each cell answers a specific question: “How many observations had this row trait AND this column trait?” Compare cells to see where observations cluster. If most of the counts pile up in certain cells, that hints at a relationship between the two variables.

Step 4: Calculate percentages. Raw counts can be misleading when group sizes differ. Convert cells to percentages by dividing each cell by the appropriate total (more on this below). This is where the real insight comes from.

Step 5: Assess the association. Once you have percentages, ask: does the distribution of one variable change depending on the category of the other? If it does, the variables are associated. If the percentages look roughly the same across rows or columns, they’re likely independent.

Three Types of Probability You Can Calculate

A contingency table lets you calculate three distinct kinds of probability, and confusing them is one of the most common mistakes people make.

Marginal Probability

This is the probability of one variable’s category, ignoring the other variable entirely. You calculate it by dividing a row total or column total by the grand total. In a table of 1,000 drivers where 100 had accidents, the marginal probability of having an accident is 100/1,000 = 0.10, or 10%. You’re using only the “margins” of the table.

Joint Probability

This is the probability that both conditions are true at the same time. You calculate it by dividing an interior cell by the grand total. If 70 drivers were both DUI and had an accident out of 1,000 total, the joint probability is 70/1,000 = 0.07, or 7%.

Conditional Probability

This is the probability of one category given that you already know the other category is true. You calculate it by dividing an interior cell by the relevant row or column total, not the grand total. This is the critical distinction.

Using the same data: the probability a driver was DUI given they had an accident is 70/100 = 0.70, or 70%. You divide by the row total for “had an accident” (100), because you’re restricting your view to only the drivers who had accidents. But the probability a DUI driver had an accident is 70/200 = 0.35, or 35%, because now you’re dividing by the column total for “DUI” (200). These two conditional probabilities answer very different questions, even though they use the same cell.

Comparing Groups With Risk Ratios and Odds Ratios

In a 2×2 table where one variable is an exposure (like smoking) and the other is an outcome (like lung disease), you can calculate measures that quantify how strongly the two are linked.

The risk ratio (also called relative risk) compares the rate of the outcome in the exposed group to the rate in the unexposed group. If you label your four cells as a, b, c, and d (top-left, top-right, bottom-left, bottom-right), and the rows are exposed vs. unexposed while the columns are ill vs. well, the formula is: (a / (a+b)) divided by (c / (c+d)). A risk ratio of 1.0 means both groups have identical risk. A ratio above 1.0 means the exposed group has higher risk. Below 1.0 means lower risk.

The odds ratio uses a simpler formula: (a × d) / (b × c). It compares the odds of disease in the exposed group to the odds in the unexposed group. Like the risk ratio, a value of 1.0 means no difference between groups. The odds ratio is especially common in case-control studies where you can’t directly calculate risk.

Testing Whether the Pattern Is Real

Just because two variables look related in a table doesn’t mean the pattern is meaningful. It could be random chance. The chi-square test is the standard way to check.

The test compares your observed cell counts to the counts you’d expect if the two variables were completely independent. If the gap between observed and expected counts is large enough, the test returns a small p-value, suggesting the association is statistically significant rather than a fluke.

One important requirement: the chi-square test becomes unreliable when expected cell counts are too low. If your table has cells where the expected frequency (not the observed count) is very small, an alternative called Fisher’s exact test is more appropriate. Statistical software will often flag this for you.

The degrees of freedom for the test follow a simple formula: (number of rows minus 1) × (number of columns minus 1). A 2×2 table has (2-1) × (2-1) = 1 degree of freedom. A 3×4 table has 2 × 3 = 6. You need this number to look up or interpret the chi-square result.

A Trap to Watch For: Simpson’s Paradox

Sometimes a pattern that’s clear in an overall contingency table completely reverses when you break the data into subgroups. This is called Simpson’s Paradox, and it’s not rare.

Here’s a real example. Researchers compared average test scores at two schools. School Alpha had an overall average of 83.2, while School Beta averaged 81.8. Alpha appears to perform better. But when scores were broken down by gender, both male and female students at Beta scored higher than their counterparts at Alpha. Males at Beta averaged 85 vs. 84 at Alpha. Females at Beta averaged 81 vs. 80 at Alpha.

The reversal happened because the groups were different sizes at each school. Alpha had 80 males (higher scorers) and only 20 females, while Beta had 20 males and 80 females. The uneven mix inflated Alpha’s overall average. The lesson: whenever a contingency table aggregates data that could be split into meaningful subgroups, the overall numbers can tell a misleading story. If a lurking third variable could be influencing the results, breaking the table into separate layers for each subgroup is the safer approach.

Practical Tips for Reading Any Table

Always check the grand total. A table based on 50 observations tells a very different story than one based on 50,000. Small samples produce unstable percentages.
Decide which direction to percentage. If you want to compare how groups differ on an outcome, calculate percentages within each group (divide by the row or column total for that group). Percentaging in the wrong direction is one of the most common errors in interpreting crosstabs.
Look at both the counts and the percentages. A cell might show a dramatic percentage (80%!) but represent only 4 out of 5 people. The percentage alone can be misleading without the underlying count.
Don’t assume causation. A contingency table shows association. Two variables can be strongly linked in a table because a third, unmeasured variable drives both of them.