What Is Positional Cloning and How Does It Work?

Positional cloning is a method for finding a gene responsible for a disease based purely on its location in the genome, without needing to know anything about what the gene actually does. The term emerged in the late 1980s to describe a strategy that was, at the time, revolutionary: instead of working backward from a known protein to its gene, researchers could start with families affected by a disease, narrow down the region of the genome where the culprit gene must sit, and zero in on it step by step. The cystic fibrosis gene, identified in 1989, was one of the first major successes of this approach.

Positional Cloning vs. Functional Cloning

To understand why positional cloning matters, it helps to know what came before it. The older approach, functional cloning, required researchers to already understand something about a disease’s biology. The hemophilia gene, for instance, was found because scientists knew the disease involved a defective blood clotting factor. They could isolate that protein, work backward to its genetic code, and identify the gene.

The problem is that for most genetic diseases, nobody knew which protein was involved. Conditions like cystic fibrosis, Huntington’s disease, and spinal muscular atrophy were clearly inherited, but the underlying biology was a mystery. Positional cloning solved this by flipping the strategy entirely. Instead of asking “what does the broken protein do?”, researchers asked “where in the genome is the broken gene sitting?” That question could be answered with genetics alone.

How the Process Works

Positional cloning is a funnel. It starts with the entire genome and progressively narrows the search area until a single mutation is found. Each stage uses different tools, but the logic is consistent: go from a larger view to a narrower one.

Linkage Analysis

The first step is studying families where the disease runs across multiple generations. Researchers collect DNA from affected and unaffected family members and scan it for genetic markers, which are known, variable spots in the genome. The goal is to find markers that are consistently inherited alongside the disease. When a marker and a disease travel together through a family tree, it means the marker sits close to the disease gene on the same chromosome, because nearby stretches of DNA tend to be inherited as a unit.

The strength of this co-inheritance is measured with a statistic called a LOD score (logarithm of the odds). A LOD score of 3 or higher is considered strong evidence that a marker is genuinely linked to the disease gene, not just coincidentally traveling with it. A score of 3 means the odds are 1,000 to 1 in favor of true linkage. Scores between 2 and 3 are considered suggestive but not definitive. This initial mapping typically defines a candidate region of about 10 million base pairs, which sounds large but is a small fraction of the 3-billion-base-pair human genome.

Narrowing the Region

Ten million base pairs can still contain dozens or hundreds of genes, so the region needs to be refined. Researchers add more genetic markers within the candidate zone and, when possible, study additional affected families to gather more data. Every additional recombination event (where chromosomes swap segments during reproduction) helps tighten the boundaries. Chromosome abnormalities visible under a microscope, such as deletions or translocations, can also help define the edges of the critical region.

In the era before genome databases, physically traversing these large stretches of DNA required laborious techniques. Chromosome walking involved starting at a known marker and cloning overlapping segments of DNA, each one extending a little further along the chromosome. Chromosome jumping leapfrogged over large stretches by deleting the middle portions of big DNA fragments and keeping only their ends, allowing researchers to cover hundreds of thousands of base pairs more efficiently. These techniques were essential for bridging the gap between genetic markers and the actual gene.

Identifying Candidate Genes

Once the region is small enough, researchers examine which genes sit within it. Today this step is largely a database search, since the human genome has been sequenced and annotated. Before genome databases existed, it required painstaking laboratory work to identify genes within a stretch of DNA.

Not all genes in the region are equally likely to be the culprit. Several criteria help prioritize candidates. A gene is a strong candidate if it plays a role in a biological pathway relevant to the disease, if mutations in that gene are already known to cause similar conditions, or if it is active in the tissues affected by the disease. Gene expression profiling, which shows where and when a gene is turned on, can highlight genes that are active in the right place at the right time. The most convincing candidates are those where a specific DNA change clearly alters the structure or amount of the protein the gene produces.

Confirming the Mutation

The final step is sequencing the candidate gene in affected individuals to find the mutation. Positional cloning culminates when a specific DNA change is found that tracks perfectly with the disease in affected families and is absent in healthy individuals. From there, cell-based or animal experiments confirm that the mutation actually disrupts the gene’s function in a way that explains the disease.

Landmark Discoveries

The identification of the cystic fibrosis gene in 1989 by Lap-Chee Tsui, Francis Collins, and colleagues was one of the earliest and most celebrated positional cloning successes. It was a remarkable achievement given that no established chromosome or genome maps existed at the time. The team identified the gene now known as CFTR (cystic fibrosis transmembrane conductance regulator), and pinpointed a specific three-base-pair deletion as the most common disease-causing mutation. Before this discovery, the molecular basis of cystic fibrosis was completely unknown.

By the mid-1990s, positional cloning had identified more than 70 human disease genes. The list reads like a catalog of previously mysterious conditions: spinal muscular atrophy, hereditary nonpolyposis colon cancer, Werner syndrome (a premature aging disorder), spinocerebellar ataxia, Barth syndrome, and the gene encoding leptin (the obesity-regulating hormone), among many others. In each case, the gene was found without prior knowledge of what protein was involved, purely by tracing its position in the genome.

How Modern Sequencing Changed the Process

The core logic of positional cloning still holds, but the tools have changed dramatically. Whole-exome sequencing and whole-genome sequencing can now read all of a person’s genes in a matter of days, compressing months of laboratory work into a computational analysis. In practice, many modern gene discovery projects combine the traditional linkage approach with next-generation sequencing.

A study of a large Chinese family with inherited acute myeloid leukemia illustrates how the hybrid approach works. Researchers first ran a genome-wide linkage scan using a high-density array of genetic markers, identifying a candidate region on chromosome 20 with a LOD score of 3.56. They then used targeted sequencing to read every gene within that region, plus whole-exome sequencing for comparison. This combination revealed a single missense mutation in a gene called TGM6 that was present in every affected family member and absent in 530 healthy controls. Traditional Sanger sequencing confirmed the finding. The entire workflow, from linkage scan to confirmed mutation, was far faster than classical positional cloning would have been.

Even with powerful sequencing technology, linkage analysis remains valuable. Sequencing a genome produces millions of variants, most of which are harmless. Linkage analysis narrows the search space so that only variants within the linked region need to be evaluated, dramatically reducing the number of false leads. Positional cloning, in this sense, hasn’t been replaced. It has been accelerated.