Is the Genetic Code Overlapping or Non-Overlapping?

The genetic code is non-overlapping. Each three-letter sequence of DNA (called a codon) is read once and codes for exactly one amino acid, with no sharing of letters between adjacent codons. However, this clean rule has important exceptions: many organisms, especially viruses, do contain overlapping genes where the same stretch of DNA encodes more than one protein by being read in different frames.

What Non-Overlapping Means

DNA is read in consecutive, non-repeating groups of three nucleotides. If a sequence reads AUGCGA, the ribosome reads AUG as one codon and CGA as the next. It doesn’t slide forward by just one base to also read UGC and GCG. Each base belongs to one codon only, and each codon selects one amino acid without influencing what comes before or after it.

This was demonstrated in 1961 by Francis Crick, Sydney Brenner, and colleagues, who showed that three bases of DNA code for one amino acid. Their experiments used a chemical that inserts or deletes single bases in DNA, which let them prove the code is read in fixed triplets from a set starting point.

Why Non-Overlapping Matters for Mutations

The non-overlapping design has a major practical consequence: a single-letter change in DNA affects only one codon, which means only one amino acid in the resulting protein can change. If the code were overlapping, a single mutation could alter two or three amino acids at once, making proteins far more fragile and harder to evolve.

Frameshift mutations illustrate the flip side of this design. When a base is inserted or deleted (rather than swapped), it shifts the reading frame for every codon downstream. Because the code has no overlaps or punctuation marks between codons, the ribosome has no way to recover its position. The entire protein sequence after the insertion or deletion is scrambled, which usually destroys the protein’s function.

Overlapping Genes Are Real

While the code itself is read without overlap, many genomes contain overlapping genes, where two different proteins are encoded by the same physical stretch of DNA read in different reading frames. This is not the same as an overlapping code. The reading machinery still processes three bases at a time without sharing, but a second reading frame starts at a different position within the same region, so the same nucleotides participate in two separate, non-overlapping readings.

Viruses use this trick extensively. Because viral genomes are small, overlapping reading frames let them pack more protein-coding information into less space. The proportion of viruses carrying overlapping coding sequences ranges from fewer than a quarter of double-stranded RNA viruses to almost three-quarters of retroviruses and single-stranded DNA viruses. Bluetongue virus, for example, was long thought to encode just one protein per genome segment. Researchers later discovered that segment 9 and segment 10 each contain a second protein encoded in a +1 reading frame overlapping the main gene, a feature conserved across more than 300 strains and 27 different serotypes.

Overlapping genes also exist in complex organisms. Recent estimates suggest that roughly 26% of all protein-coding genes in the human genome have some overlapping feature, a number much higher than scientists previously assumed.

How Overlapping Genes Cope With Evolutionary Pressure

Overlapping genes face a fundamental tension: a single nucleotide change can simultaneously affect two proteins. Mathematical modeling of the bacteriophage ΦX174 found that gene overlap reduces the number of tolerable amino acid changes by 40 to 50% compared to non-overlapping genes. That’s a steep evolutionary cost, because it limits how quickly either protein can adapt.

Organisms compensate for this constraint in a few ways. Overlapping coding regions are enriched in amino acids like arginine, serine, and proline, which have many synonymous codons (multiple DNA spellings that produce the same amino acid). This redundancy gives the DNA more room to mutate in one reading frame without disrupting the protein in the other frame. Overlapping regions also tend to be depleted in amino acids like tyrosine and isoleucine, which have fewer synonymous codons and would leave less wiggle room. Some researchers have also proposed that proteins encoded by overlapping genes are more likely to have disordered (flexible) structures, which tolerate amino acid changes more easily than rigid, tightly folded proteins.

Programmed Ribosomal Frameshifting

Some organisms deliberately exploit the reading frame to regulate protein production. In programmed ribosomal frameshifting, the ribosome intentionally slips backward or forward by one base at a specific location in the mRNA, switching into a new reading frame and producing a completely different protein from that point onward.

Coronaviruses, including SARS-CoV, depend on this mechanism. A fixed portion of ribosomes translating the first large gene (orf1a) shift into a new frame at a seven-letter “slippery” sequence to decode a second gene (orf1b). This second gene encodes the virus’s RNA-copying machinery and lacks its own start signal for translation, so frameshifting is the only way it gets made. In SARS-CoV, frameshifting efficiency runs about 15% in human cells, meaning roughly one in seven ribosomes makes the shift. This ratio controls how much copying machinery the virus produces relative to its other proteins.

The slippage happens at a specific motif where two transfer RNAs sitting on the ribosome can re-pair with the mRNA one position back. Mutations to either the slippery sequence or the RNA structure just downstream significantly reduce frameshifting, confirming it’s a tightly controlled process rather than a random error.

The Short Answer

The standard genetic code is non-overlapping: each nucleotide belongs to one codon, and each codon specifies one amino acid. But nature layers complexity on top of this rule. Overlapping genes, programmed frameshifting, and alternative reading frames all allow the same DNA to encode multiple proteins. The code’s reading rules stay the same, but the genome finds ways to read the same text more than once.