How to Match DNA: Forensics, Paternity, and More

DNA matching is the process of comparing genetic markers between two or more samples to determine whether they came from the same person, a related individual, or a compatible donor. The specific method depends on the goal: forensic investigations use short tandem repeats (STRs) to link a suspect to a crime scene, paternity tests calculate the probability of a biological relationship, ancestry services use single nucleotide changes to trace lineage, and transplant programs compare immune system markers to find compatible donors.

How Forensic DNA Matching Works

Forensic DNA matching relies on regions of your genome called short tandem repeats, or STRs. These are short sequences of DNA, two to six letters long, that repeat back to back. The number of repeats at each location varies from person to person. By measuring the repeat count at many locations across the genome, a lab builds a numerical profile that is essentially unique to you.

The process follows a standard sequence: extract DNA from the sample, measure how much is present, amplify the STR locations using a chemical copying process, then separate and read the resulting fragments on a genetic analyzer. Each STR location is tagged with a different fluorescent color and sized precisely, so the lab can distinguish one location from another even when they’re analyzed simultaneously. The final output is a string of numbers representing your repeat counts at each tested location.

Since January 2017, labs participating in the FBI’s national DNA database (CODIS) must test a minimum of 20 core STR locations. Using that many markers makes the probability of two unrelated people sharing the same profile astronomically small. When there’s no suspect, crime scene profiles are compared against the millions of profiles already stored in CODIS, looking for a “hit.”

How Match Probability Is Calculated

A DNA match isn’t simply declared as “yes” or “no.” Labs calculate something called the random match probability: the chance that a randomly selected, unrelated person from the population would happen to share the same profile. This is done using the product rule. The frequency of each STR result at each location is estimated from population data, and all those individual frequencies are multiplied together. Because each location is inherited independently, multiplying across 20 locations produces an extraordinarily small number, often less than one in a trillion.

For locations where someone carries two different versions of a marker, the frequency is calculated as twice the product of the two individual frequencies. For locations where both copies appear identical, a slightly more conservative formula is used that accounts for the possibility of shared ancestry within a population. This adjustment uses a factor called theta, typically set at 0.01, or 0.03 for smaller, more isolated populations like some Native American communities. These built-in corrections make the final statistic more reliable, not less.

DNA Matching for Paternity

Paternity testing uses the same STR technology but asks a different question: does this man’s DNA profile contain the genetic markers a child would need to have inherited from a biological father? At each STR location, a child gets one copy from the mother and one from the father. The lab identifies which markers came from the mother, then checks whether the alleged father carries the remaining ones.

The result is expressed as a probability of paternity. A value of 99% or higher means the tested man is overwhelmingly likely to be the biological father. This probability is calculated using a combined paternity index, which weighs how much more likely the genetic evidence is if the man is the father versus if a random unrelated man is the father. Courts generally treat results above 99% as strong evidence of biological parentage.

STRs vs. SNPs: Two Approaches

Forensic labs favor STRs because they are highly variable. Each STR location can have many possible versions, which means even a handful of locations can distinguish between individuals with near certainty. The trade-off is that STR analysis typically requires sequencing and costs more per test.

Consumer ancestry tests and some research applications use single nucleotide polymorphisms (SNPs) instead. A SNP is a single-letter change in the DNA code. Each individual SNP is less informative than an STR because it usually has only two possible versions, giving a maximum variability of 50%. But SNP tests compensate by checking hundreds of thousands or even millions of locations at once, and they’re cheaper to run at scale. This makes them ideal for ancestry estimation and for identifying distant relatives who share small stretches of DNA.

Tracing Maternal and Paternal Lines

Two specialized types of DNA are used to trace deep ancestry along specific family lines. Mitochondrial DNA (mtDNA) is passed from mother to child with little change, making it useful for tracking maternal lineage across many generations. Y-chromosome DNA (Y-DNA) passes from father to son and is used to trace paternal lineage. Both are valuable in ancestry research and forensic cases involving degraded remains, where standard STR testing may fail.

These markers have clear limitations. Mitochondrial DNA changes slowly, so many unrelated people can share the same mitochondrial profile. It’s useful for ruling someone out or confirming a maternal line but not for uniquely identifying an individual. Y-DNA shows greater variation between populations, which helps with geographic ancestry estimates, but the mutation rates used to date common ancestors are still debated, making precise timelines uncertain. Neither mtDNA nor Y-DNA can match the identification power of a full autosomal STR profile.

DNA Matching for Transplants

In organ and bone marrow transplants, DNA matching focuses on a completely different set of markers: the human leukocyte antigen (HLA) system. These are proteins on cell surfaces that your immune system uses to distinguish your own cells from foreign ones. The closer the HLA match between donor and recipient, the lower the risk of the body rejecting the transplant.

For a matched sibling bone marrow donor, the standard is a 6 out of 6 match at three key HLA locations. When a sibling isn’t available, unrelated donors are matched at four HLA locations, aiming for 8 out of 8. If a perfect match can’t be found, a 7 out of 8 match (one mismatch) is considered acceptable, though it carries somewhat higher risk. For umbilical cord blood transplants, the threshold is lower: a minimum of 4 out of 6 at three locations, partly because cord blood cells are immunologically less mature and more tolerant of mismatches.

Half-matched (haploidentical) family donors are also used when no better option exists. In these cases, the donor and recipient share at least 4 out of 8 markers, with no more than one mismatch at any single location. All HLA matching today is done through DNA-based methods rather than older antibody tests, providing far greater precision.

Where the DNA Sample Comes From

The two most common sources for DNA matching are blood draws and cheek (buccal) swabs. Blood has traditionally been the gold standard, but cheek swabs are nearly as reliable and far easier to collect. In controlled comparisons, blood samples achieved genotyping accuracy of about 98.4%, while cheek swabs averaged 97.8%. When researchers compared results from blood and cheek swabs taken from the same person, the two methods agreed 98.8% of the time, just 0.4% lower than comparing two blood samples from the same individual.

Cheek swabs stored frozen for as long as seven years still yielded enough DNA for successful testing. For forensic cases, DNA can also be extracted from hair, skin cells, saliva, or any biological material left at a scene, though degraded or tiny samples are more prone to incomplete profiles.

Identical Twins and the Limits of Standard Testing

Standard STR profiling cannot distinguish between identical (monozygotic) twins. Because they develop from a single fertilized egg, their STR profiles are the same across all routinely tested markers, including both autosomal and Y-chromosome STRs. If one twin has an alibi, traditional DNA evidence linking the other to a crime scene loses its value.

Whole genome sequencing can sometimes solve this problem by detecting rare mutations that occurred after the twins’ embryo split into two. In one study, researchers found between 1 and 9 single-base differences between members of three twin pairs by sequencing their entire genomes. Epigenetic differences, which are chemical modifications to DNA that accumulate differently over each twin’s lifetime, are another emerging avenue for telling twins apart.

Public DNA Databases and Genetic Genealogy

Consumer DNA testing has created a new form of DNA matching: investigative genetic genealogy. Services like GEDmatch allow users to upload raw DNA data and find genetic relatives in the database. Law enforcement agencies have used this approach to identify suspects in cold cases by finding distant relatives of an unknown suspect, then building a family tree to narrow down the individual.

GEDmatch now separates law enforcement access from personal use. Police must upload DNA through a dedicated portal called GEDmatch PRO, and searches are restricted to violent crimes (murder, manslaughter, aggravated rape, robbery, or aggravated assault) or identification of human remains. Users who select the “Personal Research” privacy option will have their kits compared against the database for their own matches, but their profiles won’t appear in results generated for law enforcement kits. This opt-in structure means your DNA data is only available to investigators if you’ve actively chosen to allow it.