What Is a Copy Number Variant? From Biology to Disease

A copy number variant (CNV) is a stretch of DNA that has been deleted or duplicated, so different people carry different numbers of copies of that segment. These segments range from as small as 50 base pairs to several million base pairs long, and collectively, CNVs overlap roughly 29% of the human genome. That makes them one of the largest sources of genetic difference between individuals, covering far more of the genome than the single-letter DNA changes (called SNPs) that get most of the attention.

How CNVs Differ From Other Genetic Variants

Most people think of genetic variation as a single “letter” in the DNA code being swapped for another. CNVs are fundamentally different. Instead of changing the spelling of a gene, they change the quantity. You might carry one copy of a particular DNA segment where someone else carries three or four. The segment could contain an entire gene, part of a gene, or a stretch of DNA between genes.

The term “copy number variant” is really an umbrella that covers several types of changes: deletions (where a segment is missing), duplications (where a segment is repeated), and insertions of new material. Scientists describe these relative to a reference genome, since there’s no single “standard” human genome to compare against. A deletion in one person might actually be the ancestral state, while someone else’s extra copy could be the newer development. The labels “gain” and “loss” are always relative.

What Causes Them

CNVs form through errors in how cells copy and repair DNA. The most well-understood mechanism happens during cell division, when similar-looking stretches of DNA on paired chromosomes misalign and swap material unevenly. This process, called non-allelic homologous recombination, tends to produce predictable, recurring CNVs at the same locations across unrelated people because the same repeated sequences keep tripping up the cellular machinery.

Other CNVs form more randomly. When a DNA replication fork stalls, the copying machinery can jump to a nearby template and resume from the wrong spot, creating duplications or deletions. Broken DNA strands that get stitched back together imprecisely can also produce CNVs. These repair-based mechanisms often create CNVs with irregular breakpoints, making each one somewhat unique. Environmental exposures matter too: radiation tends to cause duplications and deletions at equal rates, while chemical mutagens produce more deletions than duplications.

How an Extra or Missing Copy Changes Biology

The reason CNVs matter so much is gene dosage. When a deletion removes one copy of a gene, your cells typically produce less of that gene’s protein. When a duplication adds an extra copy, protein production goes up. Unlike a subtle single-letter variant that might slightly alter how a protein works, a CNV can cut output in half or boost it by 50% or more. For genes where the exact amount of protein matters, that’s a big deal.

Not all genes are equally sensitive to dosage changes. Some tolerate having only one working copy just fine. Others are “haploinsufficient,” meaning one copy isn’t enough to do the job. Interestingly, genes that are sensitive to losing a copy aren’t always sensitive to gaining one, and vice versa. This asymmetry helps explain why a deletion and a duplication at the same location can cause completely different conditions. One classic example: a duplication at a specific spot on chromosome 17 causes Charcot-Marie-Tooth disease type 1A (a nerve disorder), while a deletion at that exact same spot causes a different nerve condition called hereditary neuropathy with liability to pressure palsies.

CNVs and Neurodevelopmental Conditions

Hundreds of CNVs have been linked to neurological and psychiatric conditions, including autism, schizophrenia, and bipolar disorder. Some of these CNVs are surprisingly non-specific: the same variant can show up across multiple diagnoses. Duplications involving a gene called VIPR2 on chromosome 7 and deletions of NRXN1 on chromosome 2, for instance, have been identified in people with autism, schizophrenia, and bipolar disorder.

Other CNVs show stronger ties to particular conditions. Duplications at the 17q21.31-17q21.32 region, which contains genes involved in a key developmental signaling pathway, have been reported in 25 separate genetic studies of autism. A deletion on chromosome 3 that removes the FOXP1 gene, important for language development, has also been linked to autism specifically. These findings suggest that the brain is especially sensitive to gene dosage changes during development, and that disrupting certain pathways at certain times can push neurodevelopment in different directions depending on the individual’s broader genetic background.

The Role of CNVs in Cancer

CNVs that arise in body cells after birth, rather than being inherited, play a major role in cancer. Tumors frequently amplify oncogenes (genes that drive cell growth) and delete tumor suppressors (genes that keep growth in check). A large analysis across many cancer types found 24 consistently amplified regions containing known oncogenes, including genes behind some of the most commonly targeted cancer therapies. On the deletion side, 70 recurring regions of loss were identified, with 12 containing known tumor suppressor genes like CDKN2A, PTEN, and ATM.

Cancers from similar tissue types tend to share similar patterns of gains and losses. Squamous cell cancers of the head, neck, lung, and bladder cluster together in their CNV profiles, as do reproductive cancers like ovarian and endometrial cancer. Mutations in the tumor suppressor gene TP53 correlate with whole-genome duplication, a dramatic event where the entire genome is copied. This association holds across multiple cancer types, suggesting it reflects a fundamental relationship between DNA damage tolerance and tumor evolution.

How CNVs Are Detected

The technology for finding CNVs has improved dramatically. Chromosomal microarray analysis (CMA) has been a standard clinical tool for years. It works by comparing a patient’s DNA against a reference using thousands of probes spread across the genome. The limitation is that probe coverage isn’t uniform, so CNVs in poorly covered regions can be missed, and very small CNVs may fall below the detection threshold.

Newer approaches based on genome sequencing are closing those gaps. Low-pass genome sequencing, which reads the genome at relatively shallow depth, has been shown to catch everything CMA catches plus additional clinically significant CNVs. In a study of over 1,000 prenatal samples, sequencing identified all 124 pathogenic CNVs that microarray found, plus 17 additional ones the microarray missed. Some of those missed variants were small (under 100 kilobases), but others were nearly 300 kilobases, missed not because of size but because of gaps in probe coverage. Sequencing also performed better at detecting mosaicism, where only a fraction of cells carry the CNV, and required far less DNA to run the test (50 nanograms versus 300).

Benign Versus Pathogenic CNVs

Most CNVs are harmless. Everyone carries hundreds of them, and the vast majority either fall in non-critical regions of the genome or affect genes that tolerate dosage changes without consequence. Some are so common in the population that they’re considered normal variation, no different from having brown eyes instead of blue.

A smaller number of CNVs are clearly pathogenic. These tend to be larger, overlap genes that are dosage-sensitive, and occur in regions of the genome with known disease associations. Between these extremes is a frustrating gray zone: variants of uncertain significance, where the CNV overlaps a gene but there isn’t enough evidence to say whether it causes problems. As databases grow and more genomes are sequenced, many of these uncertain variants get reclassified, but the process is slow. If you or a family member receives a genetic test result showing a CNV, the classification (benign, pathogenic, or uncertain) depends heavily on its size, location, gene content, and whether it’s been seen before in people with similar symptoms.