What Is a Cohort: Definition, Types, and Studies

A cohort is a group of people who share a common characteristic or experience within a defined period. The word comes from the Latin “cohors,” which originally meant a group of soldiers. Today it’s used across medicine, science, and business to describe any defined group that researchers or analysts track over time to learn something useful.

Cohorts in Medical Research

In health and science, a cohort most often refers to a group of people enrolled in a study based on whether they’ve been exposed to something specific, like a chemical, a medication, a lifestyle habit, or an environmental condition. None of the participants have the outcome the researchers are looking for at the start. The whole point is to follow the group forward in time and see who develops the outcome and who doesn’t.

This design is called a cohort study, and it’s one of the most important tools in epidemiology. Because exposure is documented before the outcome happens, researchers can establish a clear timeline: the exposure came first, the outcome came second. That sequence is essential for building a case that something causes (or protects against) a disease, which is why cohort studies are considered some of the strongest observational evidence in medicine.

How Cohort Studies Work in Practice

Researchers start by recruiting a group of people and sorting them based on their exposure status. Some participants have the exposure, and some don’t. Everyone is then followed at scheduled intervals, where data is collected through interviews, questionnaires, physical exams, or lab tests. At each check-in, researchers look for new cases of the disease or outcome they’re studying.

Because the study tracks new cases as they appear, cohort studies measure what epidemiologists call incidence: how many people develop a condition over a specific period. This is different from prevalence, which counts everyone who currently has a condition, including people who’ve had it for years. Incidence tells you the actual risk of getting something, which is far more useful for understanding cause and effect.

One of the most famous examples is the Framingham Heart Study, launched in 1948. Researchers recruited 5,209 adults between the ages of 30 and 62 from Framingham, Massachusetts, and followed them for over 20 years. The study produced groundbreaking evidence linking high blood pressure, high cholesterol, smoking, and other factors to heart disease. Much of what we now consider common knowledge about heart health traces back to this single cohort.

Prospective vs. Retrospective Cohorts

Not all cohort studies run the same direction through time. The two main types differ in when the data is collected relative to the researcher’s involvement.

Prospective cohort studies recruit participants in the present and follow them into the future. Researchers control what gets measured and how, which means the data tends to be more complete and more closely matched to the research question. The tradeoff is time and cost. Following thousands of people for years or decades is expensive, and some participants inevitably drop out along the way, creating gaps in the data.

Retrospective cohort studies (sometimes called historical cohort studies) work with data that already exists. The participants’ baseline measurements and follow-ups happened in the past, and the researcher analyzes them in the present. This makes the study faster and cheaper to conduct. The downside is that the researcher has no control over how the data was originally collected. Medical records or administrative databases may be incomplete, inaccurate, or missing key variables the researcher needs.

How Cohort Studies Differ From Case-Control Studies

Both are observational, meaning researchers watch what happens rather than assigning treatments. But they start from opposite ends. A cohort study begins with exposure and looks forward for outcomes. A case-control study begins with outcomes and looks backward for exposures.

In a case-control study, researchers first identify people who already have a disease (the cases) and people who don’t (the controls), then dig into their histories to see what exposures might explain the difference. This approach is faster and works well for rare diseases, but the backward-looking design makes it harder to establish a reliable timeline between exposure and outcome. Cohort studies, by documenting exposure before the outcome occurs, provide stronger evidence for cause-and-effect relationships.

Measuring Risk in a Cohort

The signature measurement that comes out of a cohort study is called relative risk. It compares the probability of developing an outcome in the exposed group to the probability in the unexposed group. If a cohort study finds that smokers develop lung cancer at 10 times the rate of nonsmokers, the relative risk is 10.

A relative risk of 1 means the exposure made no difference. Above 1 means the exposed group had a higher risk. Below 1 means the exposure was actually associated with lower risk, which is how researchers identify protective factors like exercise or certain dietary patterns. This calculation only works when you know the exposure status of every participant from the beginning, which is why relative risk is specific to cohort studies and can’t be accurately calculated from case-control designs.

Strengths and Limitations

Cohort studies offer several clear advantages. They establish the time sequence between exposure and outcome. They can track multiple outcomes from a single exposure. And they directly measure how often new cases arise in a population, giving researchers real incidence data rather than estimates.

The limitations are practical. Prospective cohorts are expensive and time-consuming, sometimes spanning decades. Participants drop out, move away, or become unreachable, a problem known as loss to follow-up that can skew results if the people who leave differ meaningfully from those who stay. Retrospective cohorts avoid the time problem but inherit whatever flaws exist in the original data. Neither type can prove causation with the certainty of a randomized controlled trial, but for exposures that would be unethical to assign randomly (you can’t ask people to start smoking), cohort studies are the best available option.

Cohorts in Business and Analytics

Outside of medicine, the term cohort has become standard in business analytics, particularly for understanding how users or customers behave over time. A cohort in this context is a group of users who share a common starting point, like the same sign-up month or the date they first made a purchase.

Cohort analysis lets companies separate growth from engagement. Aggregate numbers might show that total users are climbing, but a cohort breakdown could reveal that people who signed up after a recent product change have 20% lower retention after one week than earlier groups. Without splitting users into cohorts, that problem stays hidden behind the overall growth curve.

Businesses typically work with three types of cohorts. Acquisition cohorts group users by when they first showed up, such as weekly or monthly sign-up groups. Behavioral cohorts group users by actions they took, like completing onboarding or using a specific feature. Revenue cohorts group users by spending patterns, such as free versus paid subscribers. If February sign-ups retain at 45% while January sign-ups retained at 60%, something changed between those months, and the cohort analysis points directly to where to investigate.