What Are Cohorts and How Do Cohort Studies Work?

A cohort is a group of people who share a defined characteristic, such as birth year, occupation, or a specific exposure, and are followed over time to see how their health changes. The concept is foundational to medical research: by tracking thousands (or even hundreds of thousands) of people for years or decades, researchers can identify what increases or decreases the risk of diseases ranging from heart disease to cancer. Some of the most important discoveries in public health have come from cohort studies.

How Cohort Studies Work

The core idea is straightforward. Researchers identify a group of people who don’t yet have the disease or outcome being studied. They record baseline information, including potential exposures like smoking, diet, pollution, or workplace chemicals. Then they follow the group over time, checking periodically to see who develops the outcome and who doesn’t.

Because participants are selected before the outcome occurs, researchers can establish a timeline: the exposure came first, then the disease followed (or didn’t). This makes cohort studies especially powerful for understanding cause and effect compared to study designs that work backward from a diagnosis. Cohort studies are also observational, meaning participants simply live their lives. No one is assigned to smoke or eat a particular diet. Researchers watch and measure rather than intervene.

Participants are typically divided into at least two subgroups: those exposed to a factor of interest and those not exposed. Ideally, the two groups are similar in every other way. Over years of follow-up, researchers compare the rate of disease in the exposed group to the rate in the unexposed group, producing a measure called relative risk. A relative risk of 2 means the exposed group is twice as likely to develop the disease. A relative risk below 1 means the exposure is actually protective.

Prospective vs. Retrospective Cohorts

Not all cohort studies run the same direction through time. In a prospective cohort, researchers recruit participants today and follow them into the future, collecting data as events unfold. This gives the strongest data because measurements are standardized and nothing is left to memory or old records. The downside is time and cost: if a disease takes 20 years to develop, the study takes at least 20 years to produce results.

In a retrospective cohort, the outcomes have already happened. Researchers use existing records, such as medical charts, employment files, or insurance databases, to reconstruct who was exposed and who developed the disease. The basic logic is the same as a prospective study, but the data come from the past rather than being collected fresh. Retrospective designs are faster and cheaper, though they depend entirely on how well the original records were kept.

Landmark Cohort Studies

The Framingham Heart Study is the most famous cohort in medicine. It enrolled 5,209 adults between the ages of 26 and 66 in 1948 in Framingham, Massachusetts, with the goal of identifying risk factors for cardiovascular disease. At the time, heart attacks were the leading cause of death in the United States, and doctors knew almost nothing about why. Framingham research established that high blood pressure, high cholesterol, smoking, and diabetes all predict heart disease. The study is now tracking a third generation: spouses and children of the original participants joined in 1971, and their grandchildren enrolled starting in 2002.

The Nurses’ Health Study, launched in 1976, assembled 121,700 female nurses and has generated a substantial share of what we know about women’s health and chronic disease prevention. It has produced findings on breast cancer, colorectal cancer, cardiovascular disease, type 2 diabetes, obesity, neurodegenerative diseases, reproductive health, eye conditions, kidney disorders, and mental health, among others. A second wave enrolled 116,430 younger nurses in 1989 to study oral contraceptives and reproductive risk factors, and a third wave in 2010 added nearly 40,000 more, with greater racial and ethnic diversity.

The UK Biobank, one of the largest modern cohorts, recruited 502,000 volunteers aged 40 to 69 between 2006 and 2010 across England, Wales, and Scotland. Beyond questionnaires, participants underwent physical measurements including blood pressure, lung function, hand grip strength, vision tests, and body composition scans. Blood and urine samples were stored for future analysis. Subsets of participants also completed bone density scans, hearing tests, retinal imaging, and cardiorespiratory fitness tests. Whole-genome sequencing data are now available for the entire cohort, making it one of the richest datasets in biomedical science.

Birth Cohorts and Childhood Exposures

Some cohort studies start at birth, or even before it, to capture how early-life conditions shape health decades later. Research from birth cohorts has shown that chronic obstructive lung disease is tied not only to fetal growth and gestational age but also to lifetime air pollution exposure and smoking behaviors. The Children’s Health Study followed participants into their mid-40s, focusing on how air pollution affects lung development from childhood through young adulthood. Other birth cohorts, like the Wayne County Health, Environment, and Allergy/Asthma Longitudinal Study, track how home allergens and environmental exposures in infancy contribute to asthma and allergy risk, sometimes enrolling only children whose parents have a history of those conditions.

Why Cohorts Are Preferred for Certain Questions

Cohort studies shine in situations where other study designs fall short. They are particularly useful for examining rare exposures. If researchers want to know whether a specific industrial chemical increases cancer risk, they can recruit workers exposed to that chemical and follow them, rather than starting with cancer patients and trying to trace their exposure history backward.

Cohorts also let researchers examine multiple outcomes at once. A single study of smokers can simultaneously track rates of lung cancer, heart disease, stroke, and emphysema. And because researchers measure disease rates directly in exposed and unexposed groups, they can calculate how much a given exposure actually changes someone’s risk over time.

The Challenge of Keeping Participants

The biggest practical problem with long-running cohorts is attrition: people move, lose interest, become too ill to participate, or die. In studies of older adults, attrition rates as high as 77% over 10 years have been reported, and even shorter studies lasting just two years can lose up to 40% of participants. One six-year study of osteoporosis lost 30% of its participants to death or dropout.

This matters because the people who leave are often different from the people who stay. Sicker, older, or more disadvantaged participants tend to drop out at higher rates, which skews the remaining sample toward healthier individuals. This selective attrition can distort results in ways that are difficult to fully correct after the fact, even with statistical adjustments. It is one reason why large initial enrollment numbers are so important: starting with 500,000 people, as the UK Biobank did, builds in a buffer against the inevitable losses.

Built-In Biases to Watch For

One well-documented bias in occupational cohorts is the healthy worker effect. When researchers compare disease rates among factory workers to disease rates in the general population, the workers almost always appear healthier. The reason is simple: the general population includes children, elderly retirees, and people too sick to hold a job. Workers, by definition, were healthy enough to be hired and stay employed. Anyone who became seriously ill likely left the workforce, dropping out of the study but remaining in the general population statistics.

The result is a systematic underestimation of workplace hazards. Studies have found the healthy worker effect masks roughly 25% of the true association between a harmful occupational exposure and death. A toxic chemical might look less dangerous than it actually is because the comparison group is stacked with people who were never well enough to be exposed in the first place. Researchers address this by comparing exposed workers to unexposed workers in the same company or industry, rather than to the general population.

Cohort studies also carry the risk of confounding, where an unmeasured factor, rather than the exposure being studied, is the real cause of a difference in outcomes. Long follow-up periods compound this problem because people’s behaviors and environments change over decades. Careful collection of data on potential confounders at multiple time points helps, but no observational study can eliminate this concern entirely.