What Is Allele Frequency and How Is It Calculated?

Allele frequency is the proportion of a specific gene variant (allele) found in a population, expressed as a number between 0 and 1. If you have a population of 100 people, each carrying two copies of a given gene, that’s 200 total copies. If 60 of those copies are allele A, the allele frequency of A is 60/200, or 0.30 (30%). It’s one of the most fundamental measurements in genetics, used to track how populations evolve, assess disease risk, and understand human diversity.

How Allele Frequency Is Calculated

The formula is straightforward: count how many times a particular allele appears in your population, then divide by the total number of gene copies. For any gene in a species like humans, where each person carries two copies (one from each parent), you multiply the number of individuals by two to get the total.

When a gene has only two allele variants, geneticists label them p and q. Their frequencies always add up to 1: p + q = 1. So if allele p has a frequency of 0.7, allele q must be 0.3. This relationship is the foundation of the Hardy-Weinberg equation, which extends the math to predict how many individuals will carry each combination of alleles. The expanded equation, p² + 2pq + q² = 1, tells you the expected proportion of people who are homozygous for the dominant allele (p²), heterozygous (2pq), or homozygous for the recessive allele (q²).

Allele Frequency vs. Genotype Frequency

A common point of confusion: allele frequency and genotype frequency are related but not the same thing. Allele frequency counts individual alleles. Genotype frequency counts the fraction of individuals carrying a specific pair of alleles. A person with one copy of allele A and one of allele B has a heterozygous genotype, but each of those alleles is counted separately when calculating allele frequency. You can derive genotype frequencies from allele frequencies (using the Hardy-Weinberg equation), but you can’t swap the two concepts interchangeably.

Phenotype frequency, the fraction of individuals showing a particular trait, adds another layer. Because some alleles are dominant over others, two people with different genotypes can look the same. A person carrying two copies of a dominant allele and a person carrying one dominant and one recessive copy both express the dominant trait, but their contributions to allele frequency differ.

What Changes Allele Frequencies

If nothing interfered, allele frequencies in a population would stay constant generation after generation. In reality, four main forces push them around.

Natural selection shifts allele frequencies when certain variants help individuals survive and reproduce more successfully. Over time, beneficial alleles become more common while harmful ones decline. This is called directional selection, and it’s the most intuitive driver of evolutionary change.

Genetic drift is the random fluctuation that happens because real populations aren’t infinite. Every generation, only some individuals pass on their genes, and chance alone can cause an allele to become more or less common. Drift has a larger effect in small populations, where a few lucky or unlucky births can dramatically change the numbers. Over enough time, drift alone can push an allele all the way to a frequency of 1.0 (fixation) or 0.0 (loss), eliminating variation at that gene entirely.

Gene flow occurs when individuals migrate between populations and introduce their alleles to a new group. This tends to make separate populations more genetically similar to each other. Mutation introduces entirely new alleles into a population. While any single mutation is rare, over millions of genetic positions and many generations, mutation provides the raw material that selection and drift act on.

Common vs. Rare Variants

Geneticists classify alleles by how common they are. The less common version at any gene position is called the minor allele, and its proportion is the minor allele frequency (MAF). A variant with a MAF of 1% or higher is considered polymorphic, meaning it’s common enough to be a regular feature of the population. Below 1%, it’s classified as a rare variant.

These thresholds matter in research. Large-scale genetic studies like the HapMap project focused on variants with a MAF of 5% or above. Many genome-wide association studies discard variants below 10% MAF because statistical power drops sharply with rare variants, making it harder to detect meaningful associations. This filtering is practical but comes with a cost: rare variants that genuinely contribute to disease risk can be missed entirely.

Why Allele Frequency Matters for Health

When geneticists discover a variant linked to a disease, one of the first things they check is its allele frequency across different populations. A variant that’s common in one group but absent in another can create blind spots in medical genetics. Most large genetic studies have historically used European-descent cohorts, and disease associations found in one population can over- or underestimate genetic risks in others.

The pattern is striking. Risk alleles that are ancestral (shared with other primates) tend to be found at higher frequency in African populations, with a mean difference of about 9.5% compared to non-African populations. Derived risk alleles (those arising from newer mutations) show the opposite trend, appearing at higher frequency in non-African populations by about 5.4%. These systematic differences mean that genetic risk scores built from European studies can misestimate disease risk when applied to African populations. Research published in Genome Biology found that studies using African cohorts produced disease risk predictions that generalized well to all global populations, while studies from non-African cohorts did not translate as accurately to African groups.

Clinicians also use allele frequency databases to determine whether a genetic variant is likely to cause disease. If a variant suspected of causing a severe condition turns up at a frequency of 5% in the general population, it’s probably not the culprit, since a truly harmful variant would be kept rare by natural selection. The Genome Aggregation Database (gnomAD), one of the largest reference databases, now includes data from over 800,000 exomes and genomes, reporting allele frequencies broken down by genetic ancestry group. Its latest version flags variants that show significantly different frequencies between datasets, helping researchers catch technical artifacts.

How Allele Frequencies Are Measured

Early methods required physically counting alleles in small samples, often using protein markers as proxies for genetic variation. Modern approaches are far more powerful. Next-generation sequencing can process hundreds of individual samples in a single run, reading DNA directly to identify single nucleotide polymorphisms (SNPs) at millions of positions across the genome. SNP arrays, which test for known variants at predetermined locations, offer a faster and cheaper alternative when full sequencing isn’t needed.

For populations where individual sampling is impractical, researchers have developed methods to estimate allele frequencies from pooled DNA samples. Quantitative PCR techniques can measure relative allele abundance in a mixed sample, then use statistical models to estimate the underlying population frequency. These pooled approaches sacrifice some precision but make it feasible to survey allele frequencies in organisms that are difficult to sample individually, from invasive insect species to microbial communities.

Fixation and Loss

An allele that reaches a frequency of 1.0 is said to be “fixed” in the population. At that point, every individual carries it and no alternative exists at that gene position. The opposite, an allele dropping to 0.0, means it has been lost entirely. Both outcomes eliminate genetic variation at that locus.

Fixation can happen through strong natural selection favoring one allele, or through genetic drift in small populations where chance events compound over generations. In mathematical models using infinite population sizes, an allele can approach fixation asymptotically without ever technically reaching it, but in real, finite populations, the endpoint is concrete: the allele either reaches 100% or disappears. The speed at which this happens depends heavily on population size and the strength of selection acting on the variant.