A dichotomous variable is a variable with exactly two possible values. Think yes/no, true/false, pass/fail, alive/dead. It’s the simplest type of categorical variable, and one of the most common in statistics, medicine, and survey research. You’ll also hear it called a binary variable, since “dichotomous” just means “divided into two parts.”
How Dichotomous Variables Work
A dichotomous variable places every observation into one of two mutually exclusive categories. There’s no overlap and no middle ground. A patient either has diabetes or doesn’t. A coin lands heads or tails. A student passes or fails. The two categories can be inherently binary (biological sex at birth) or created by drawing a line through a continuous measurement (blood pressure above or below 140 mmHg).
The key feature is that the categories have no natural ranking. “Yes” isn’t higher than “no” in any mathematical sense. This makes dichotomous variables a special case of nominal variables, which are categories without an inherent order. Hair color is nominal with many categories. A dichotomous variable is nominal with exactly two.
Coding With 0 and 1
In statistical software, dichotomous variables are almost always coded as 0 and 1. One category gets a 0 (the reference group) and the other gets a 1. This isn’t arbitrary. Coding this way lets the variable slot directly into regression equations and other models, because the math treats 0 and 1 as numbers while still representing categories.
When a dichotomous variable coded 0/1 enters a regression model, its coefficient has a straightforward interpretation: it’s the difference in the outcome between the two groups. If you’re predicting test scores and your dichotomous predictor is “attended tutoring” (1) versus “did not attend” (0), the coefficient tells you the average score difference between the two groups. The group coded 0 is the reference level, the baseline everything else is compared against.
This same logic extends to variables with more than two categories. A variable like race or region gets broken into multiple dichotomous variables through a process called dummy coding. If a variable has four categories, you create three new 0/1 variables, each representing one category compared to a reference group. Every category gets its own comparison to the baseline.
Common Examples
Dichotomous variables show up everywhere in research and daily life:
- Medical outcomes: survived/died, disease present/absent, treatment successful/unsuccessful
- Survey questions: yes/no, agree/disagree, true/false
- Demographics: employed/unemployed, smoker/non-smoker, insured/uninsured
- Clinical decisions: normal/abnormal, treat/do not treat, risky/benign
Healthcare surveys rely heavily on dichotomous questions because they demand clarity. “Is this your first pregnancy?” or “Do you have any history of high blood pressure?” leave no room for ambiguity, which matters when the answers guide diagnosis and treatment. The tradeoff is that dichotomous questions can’t capture nuance. Someone might mostly agree or somewhat disagree, but a yes/no format forces them to pick a side.
Statistical Tests for Dichotomous Data
The type of analysis you use depends on whether the dichotomous variable is the thing you’re trying to predict (the outcome) or the thing you’re using to predict (the input).
When the outcome is dichotomous, logistic regression is the standard tool. It predicts the probability of being in one category versus the other. For example, predicting whether a patient will be readmitted to the hospital (yes/no) based on their age, diagnosis, and length of stay. Logistic regression can handle one predictor or many, and those predictors can be continuous, dichotomous, or a mix of both.
When you want to test whether two categorical variables are related to each other, a chi-square test is the go-to. It compares the proportions you observed in your data to what you’d expect if there were no relationship. One important requirement: at least 80% of the cells in your analysis should have an expected count of 5 or more, and no cell should have an expected count below 1. In practice, this means you need a reasonably large sample. For a simple 2×2 table (two dichotomous variables crossed), you’d want at least 20 observations as a rough floor.
A less common but useful test is the McNemar test, which applies when the same people are measured twice on the same dichotomous variable. Did a patient test positive before treatment and negative after? McNemar’s test evaluates whether the shift between categories is statistically meaningful.
The Problem With Artificially Creating Them
Sometimes researchers take a continuous variable like blood pressure, age, or BMI and split it into two groups at some cutoff. Above 140 mmHg becomes “high blood pressure.” A BMI of 30 or above becomes “obese.” This is called dichotomization, and while it simplifies clinical decisions, it comes with real statistical costs.
Splitting a continuous variable at its midpoint throws away roughly a third of your data’s information. That’s not a rough estimate. Statistically, the loss of power from dichotomizing at the median is equivalent to deleting a third of your dataset. In one analysis of liver disease patients, a model using bilirubin as a continuous measurement explained 31% more of the variation in outcomes than a model that split bilirubin into just two groups.
The problems go beyond lost power. Dichotomization lumps together people who are very different. Someone with a BMI of 30.1 and someone with a BMI of 45 both land in the “obese” category, masking enormous variation within each group. It also hides non-linear relationships. If the risk of a bad outcome accelerates sharply at a certain value rather than increasing steadily, a simple two-group split won’t reveal that pattern. And when differences between groups are detected, they tend to be exaggerated, with confidence intervals that look misleadingly narrow.
The general rule: if a variable is naturally continuous, keep it continuous in your analysis. Dichotomize only when clinical decision-making genuinely requires a binary choice, like treat or don’t treat.
Visualizing Dichotomous Data
Bar charts are the most effective way to display dichotomous variables. A simple bar chart showing the count or percentage in each category is immediately readable. If 68% of survey respondents said “yes” and 32% said “no,” two bars communicate that instantly.
Pie charts can also work here, since a dichotomous variable naturally represents a part-to-whole relationship. With only two slices, the chart stays clean and easy to read, which isn’t always true of pie charts with many categories. For examining the relationship between two dichotomous variables, stacked or grouped bar charts let you compare proportions across groups side by side.

