Where Do New Genes Come From? Duplication to De Novo

New genes arise through several distinct mechanisms: duplication of existing genes, birth from previously non-coding DNA, transfer of genes between unrelated species, and reshuffling of protein-building segments. These processes have been operating for billions of years and continue today, including in the human lineage. An estimated 18 human genes arose from scratch since our ancestors diverged from chimpanzees.

Gene Duplication: Copying What Already Works

The most common source of new genes is duplication, where a stretch of DNA gets copied so that an organism ends up with two versions of the same gene. Once a spare copy exists, it’s free to accumulate mutations without threatening the organism’s survival, because the original copy still does its job. Over time, one copy can drift into a new function entirely.

Duplication happens at several scales. The smallest and most frequent type is tandem duplication, where a nearby stretch of DNA gets copied and placed right next to the original. This accounts for 75 to 90 percent of all sub-chromosomal duplications in some mammalian genomes. It typically occurs when the cell’s DNA repair machinery rejoins broken ends incorrectly, accidentally duplicating a segment in the process.

At the largest scale, entire genomes can be duplicated at once. Vertebrate evolution was shaped by two rounds of whole-genome duplication early in the lineage’s history, a pattern known as the 2R hypothesis. That means the ancestors of all fish, amphibians, reptiles, birds, and mammals had their entire genetic toolkit doubled twice, providing raw material for the complex body plans vertebrates eventually developed. Additional whole-genome duplications keep turning up across the animal and plant kingdoms, and they appear to be a recurring engine of evolutionary change.

Genes Born From Scratch

Perhaps the most surprising source of new genes is “de novo” gene birth, where a stretch of DNA that previously had no function gradually becomes a working gene. This was once considered nearly impossible. The odds of random DNA encoding a useful protein seemed vanishingly small. But genomic studies have now documented it repeatedly across species.

For a non-coding sequence to become a protein-coding gene, two things need to happen. The DNA must start being transcribed (read into RNA), and it must acquire an open reading frame, the specific start-and-stop signals that allow it to be translated into a protein. These two events can occur in either order. In some cases, a region is already being transcribed as RNA before it picks up the ability to code for protein. In others, the coding sequence comes first and a promoter region (the “on switch” for transcription) evolves afterward. Arctic codfish, for example, evolved an antifreeze protein gene where the coding sequence appeared before the machinery to turn it on.

One model proposes that new genes go through a “proto-gene” stage, where they’re only weakly expressed and produce short, somewhat disordered proteins. If any of this activity proves useful, natural selection gradually refines the gene into a mature, stable form. A competing model suggests gene birth can be more sudden: as soon as a new coding sequence produces something that helps the organism, selection locks it in.

Orphan Genes and What They Tell Us

Every species has genes found nowhere else in the tree of life. These are called orphan genes, and they lack any detectable similarity to genes in other organisms. They show up in every genome that gets sequenced, sometimes making up a significant fraction of an organism’s gene catalog. Because they have no known relatives, their functions are often hard to predict using standard tools that compare genes across species.

Orphan genes are strong candidates for de novo origins, and they appear to play important roles in helping organisms adapt quickly to new environments. In sugarcane, for instance, orphan genes are associated with responses to both biological threats like pathogens and physical stresses like drought. Their rapid emergence and functional integration suggest that evolution’s ability to invent entirely new genes is not rare but routine.

Borrowing Genes From Other Species

Bacteria swap genes constantly through horizontal gene transfer, passing DNA between unrelated species rather than inheriting it from a parent. For a long time, scientists assumed this was mostly a bacterial trick. Complex organisms like plants and animals were thought to be largely immune. That assumption has eroded considerably.

Horizontal transfer has now been documented across a wide range of multicellular life. Grasses have swapped nuclear genes with other grass lineages on multiple occasions. Centipede venom arsenals were repeatedly stocked with genes acquired horizontally. Nematode worms picked up genes for digesting plant cell walls, allowing them to expand their diets. Among fish, herring appear to have acquired an antifreeze protein gene through lateral transfer and then passed a copy to smelt. DNA has moved between organisms even across kingdom boundaries, from bacteria into animals and from fungi into plants. While less frequent than in microbes, these transfers have meaningfully shaped the genomes of complex organisms.

Remixing Existing Parts

Not every new gene requires building from scratch or copying a whole gene. Sometimes evolution assembles new genes by rearranging the modular pieces of existing ones.

Proteins are often built from distinct functional domains, each encoded by one or more exons (the segments of a gene that actually code for protein). Through a process called exon shuffling, a domain from one gene can be inserted into another gene, creating a protein with a novel combination of functions. This is driven by recombination events that occur within the non-coding stretches (introns) between exons. The result is a kind of molecular mix-and-match, where evolution builds new tools from a library of pre-tested components. This domain-level remixing is thought to be a major factor in the evolution of human biological complexity.

Gene fusion works on a similar principle but at a larger scale. Two separate genes that already work together in the same pathway can merge into a single gene encoding one protein with both functions. The advantage is efficiency: fusing two steps of a biochemical process into one protein speeds up the reaction and ensures both components are always produced together in the right amounts. The accumulation of multiple domains through fusion appears to be one of the key routes through which multicellular organisms evolved greater functional complexity.

A Case Study: Antifreeze in Antarctic Fish

One of the best-documented examples of new gene evolution comes from Antarctic notothenioid fish, which dominate the freezing waters of the Southern Ocean. These fish produce antifreeze glycoproteins that bind to ice crystals in their blood and prevent them from growing. The gene for this protein evolved from a completely different gene: one that codes for trypsinogen, a digestive enzyme made in the pancreas.

The process was a creative act of molecular recycling. The beginning and end of the old trypsinogen gene were kept, providing the signal that tells the cell to secrete the protein and the regulatory sequences at the tail end. But the middle of the gene, the part encoding the digestive enzyme itself, was deleted. In its place, a tiny nine-nucleotide segment from the original gene that coded for just three amino acids (threonine-alanine-alanine) was amplified over and over through DNA copying errors called slippage replication. This repetitive copying created an entirely new protein-coding region that produces the repetitive backbone of the antifreeze molecule.

So a gene that helped digest food was partially dismantled, had a small fragment massively duplicated, and became a gene that prevents blood from freezing. The whole process combined gene recruitment, deletion, and de novo amplification into a single evolutionary innovation, illustrating how multiple mechanisms can work together to produce something genuinely new.

How Long New Genes Take To Spread

A new gene doesn’t become a permanent part of a species overnight. Once a new gene variant appears in a single individual, it has to spread through the entire population, a process called fixation. How long this takes depends on three factors: the size of the population, how often new mutations arise, and the probability that any given mutation will actually make it to fixation rather than disappearing by chance.

Most new mutations are lost. Even beneficial ones can vanish through sheer bad luck in early generations. For a mutation to have a shot at spreading, it needs to survive that initial period of vulnerability and then gradually increase in frequency over many generations. In large populations, this process takes longer because more individuals need to acquire the new variant. The expected number of generations before a successful mutant fixates scales with population size, meaning new genes can take anywhere from thousands to millions of generations to become universal in a species, depending on the organism’s population dynamics and how strongly selection favors the new gene.

In the human lineage, the 18 or so genes estimated to have arisen de novo since our split from chimpanzees had roughly six to seven million years to emerge and spread. That timeframe, combined with relatively small ancestral population sizes, was sufficient for entirely novel proteins to go from non-coding DNA to fixed, functional genes expressed across the species.