When an extra nucleotide is added to a DNA sequence, it causes what’s known as an insertion mutation. Because the cell reads DNA in groups of three nucleotides (called codons), adding even one extra nucleotide shifts the entire reading frame downstream of the insertion. Every codon after that point is misread, typically producing a completely wrong protein or no functional protein at all. This type of disruption is called a frameshift mutation, and it’s one of the most damaging things that can happen to a gene.
How the Reading Frame Breaks
Your cells read genetic instructions three nucleotides at a time. Each three-letter group codes for one amino acid, and amino acids are the building blocks of proteins. When an extra nucleotide gets inserted, every three-letter group after the insertion point shifts by one position. Imagine a sentence made of three-letter words: “THE CAT ATE THE RAT.” Insert one extra letter after the first word and it becomes “THE XCA TAT ETH ERA T.” Every word after the insertion is garbled.
In genetic terms, this means every amino acid downstream of the insertion will be wrong. The protein that results is a completely different molecule from what the gene was supposed to produce. Worse, the shifted reading frame almost always creates a premature stop codon, a signal that tells the cell to stop building the protein early. The result is a truncated, nonfunctional protein fragment, if the cell even allows it to be made at all.
Why the Extra Nucleotide Gets Added
The most common cause is a phenomenon called replication slippage. When the cell copies DNA, the copying machinery (DNA polymerase) occasionally pauses, especially in regions where short sequences repeat, like AAAA or CAGCAGCAG. During that pause, the polymerase can detach from the DNA strand it’s building. The newly made strand then briefly separates from the template and reattaches in a slightly misaligned position. When the polymerase reloads and resumes copying, the misalignment means one or more extra nucleotides get incorporated.
Research published in The EMBO Journal mapped this process in three distinct steps: the polymerase stalls within a repeated sequence, it detaches from the DNA, and then the tip of the new strand “breathes” (partially separates) and re-pairs at the wrong position. Longer repeated sequences actually make slippage less efficient, not more, because the strand has to travel farther to find the misaligned pairing spot. Short repeats of one, two, or three nucleotides are the most vulnerable to this kind of error.
How Cells Catch and Fix the Mistake
Cells have a dedicated proofreading system called mismatch repair that specifically looks for extra nucleotides. When an insertion happens, the extra nucleotide bulges out from the double helix, forming what’s called an insertion-deletion loop. Specialized protein complexes patrol newly copied DNA looking for exactly these distortions.
One complex recognizes small loops of one or two extra nucleotides, while another handles larger loops up to about 17 nucleotides. Once the bulge is detected, the repair proteins lock onto the DNA and slide along it like a clamp, recruiting additional machinery. The system figures out which strand is the new copy (and therefore the one with the error), cuts into it, and removes the section containing the extra nucleotide. A polymerase then fills in the gap correctly, and the strand is sealed. This repair system reduces mutation rates by 100 to 10,000 fold.
But mismatch repair isn’t perfect. When it misses an insertion, the mutation becomes permanent in that cell and all its descendants.
What Happens if the Mutation Survives
When a frameshift mutation makes it past repair, the cell has one more line of defense: a surveillance system called nonsense-mediated mRNA decay (NMD). Before a protein is built, the gene is first copied into an mRNA molecule. NMD scans these mRNA copies and flags any that contain a premature stop codon more than 50 to 55 nucleotides before the last splice point in the message. Flagged transcripts are destroyed before they can be translated into defective proteins.
This is actually protective. Rather than flooding the cell with broken protein fragments that could interfere with normal function, NMD ensures the faulty message is degraded. In studies of specific frameshift mutations, blocking NMD significantly increased the amount of mutant mRNA present, confirming that cells actively destroy these messages rather than letting truncated proteins accumulate.
The Difference Between One Nucleotide and Many
A single extra nucleotide causes a frameshift. But insertions can range from one nucleotide to thousands. An important distinction: if the number of inserted nucleotides is a multiple of three, the reading frame stays intact. The protein gets extra amino acids at the insertion site, but everything downstream reads correctly. This can still be harmful, but it’s a fundamentally different kind of damage than a frameshift.
Trinucleotide repeat expansions are a well-known example. In Huntington’s disease, a three-letter sequence (CAG) repeats too many times within a gene. Because the insertion is always in multiples of three, the reading frame is preserved, and the protein is actually produced. But the extra stretch of repeated amino acids makes the protein toxic, causing it to misfold and aggregate in brain cells. This is a gain-of-function mutation: the protein doesn’t just stop working, it actively causes harm.
By contrast, most single-nucleotide insertions cause loss of function. The protein either isn’t made (thanks to NMD) or is so truncated it can’t do its job. Loss-of-function mutations tend to be far more structurally disruptive to proteins than gain-of-function mutations, which are often surprisingly mild at the structural level but damaging through other mechanisms like aggregation or altered activity.
Real-World Disease Examples
Tay-Sachs disease provides a textbook case. The most common mutation worldwide in the gene responsible for this condition is a four-nucleotide insertion in one section of the gene. Those four extra bases shift the reading frame and create a premature stop codon downstream. The mRNA transcript is recognized as defective and degraded, so the enzyme the gene codes for is barely produced. Without that enzyme, fatty substances accumulate in nerve cells, causing progressive neurological damage in infants.
Long QT syndrome, a heart rhythm disorder, offers another example. A specific frameshift mutation in the gene encoding a potassium channel protein triggers NMD, which destroys most of the mutant mRNA. The result is reduced levels of the channel protein, which disrupts the electrical signals controlling heartbeat. Researchers confirmed this was primarily an mRNA destruction problem rather than a truncated protein problem by blocking NMD and watching mutant mRNA levels rise.
How Often Insertions Happen Naturally
Every person carries new mutations not present in either parent. A recent multigenerational study estimated 98 to 206 new mutations per generation, including about 7.4 small insertions or deletions outside of repeat regions and roughly 65 originating within repetitive stretches of DNA. Most previous studies, which focused on easier-to-sequence parts of the genome, had converged on 60 to 70 total new mutations per generation, but that number appears to have been an undercount.
The vast majority of these insertions land in non-coding regions of DNA, where they have no effect on protein production. It’s only when an insertion hits a gene, and specifically falls within the protein-coding portion, that a frameshift can cause disease. Even then, you carry two copies of most genes, so a frameshift in one copy may be compensated by the healthy copy on your other chromosome. Disease typically results when both copies are affected, or when even half the normal protein level isn’t enough.

