What Determines the Genomic Size of a Species?

Genome size is not determined by how complex an organism is or how many genes it has. Instead, it is shaped primarily by the accumulation of repetitive, non-coding DNA, especially jumping genes called transposable elements, along with events like whole-genome duplication. The forces that expand or shrink a genome are largely neutral or indirect, driven by mutation, genetic drift, and deletion patterns rather than by a need for more biological instructions.

The C-Value Paradox: Why Complexity Doesn’t Predict Size

One of the most counterintuitive facts in biology is that genome size has almost no relationship to an organism’s apparent complexity. A single-celled amoeba can carry hundreds of times more DNA than a human. Among flowering plants that diverged roughly 150 million years ago, genome size ranges from just 63 million base pairs in a tiny carnivorous plant to approximately 150 billion base pairs in the canopy plant Paris japonica, a 2,000-fold difference. Mammals that diversified over a similar timeframe show only about a fivefold range, from 1.6 billion base pairs in a small bat to around 8 billion in a tetraploid rodent.

This disconnect was named the C-value paradox in 1971 by geneticist C.A. Thomas. He pointed out three puzzles: many “simpler” organisms have far more DNA than complex ones, closely related species with similar body plans can differ by tenfold or more, and up to 98% of a genome can consist of sequences that don’t code for proteins. The number of protein-coding genes across organisms actually varies within a surprisingly narrow range and correlates poorly with either genome size or biological complexity.

The paradox only exists if you assume genomes should be efficiently engineered. Biological organisms are not designed; they are the product of natural selection acting on random events. There is no built-in pressure to keep a genome lean. As long as extra DNA isn’t harmful enough to be weeded out, it persists.

Transposable Elements: The Biggest Driver

The single largest factor explaining genome size differences is the amount of repetitive DNA, and the bulk of that repetitive DNA consists of transposable elements. These are sequences that can copy themselves and insert new copies elsewhere in the genome, gradually inflating total DNA content over generations.

The relationship is strikingly direct. In plants, transposable elements make up as little as 3% of a small genome and as much as 85% of a large one like maize. Genome size in plants is essentially a linear function of transposable element content. The pattern holds broadly across eukaryotes: species with large genomes almost always carry a heavy load of these self-replicating sequences, while species with compact genomes have kept them in check.

Transposable elements don’t serve the organism’s immediate needs in most cases. They persist because they are effective replicators in their own right, and because in species with small population sizes, natural selection is too weak to purge mildly harmful extra DNA. This connects to a broader principle in genome evolution: much of what fills a genome is there not because it helps the organism, but because it wasn’t costly enough to remove.

Whole-Genome Duplication

Polyploidy, the duplication of an entire genome, is the most dramatic way a genome can grow in a single event. It instantly doubles both total DNA and the full set of genes. This process has been especially important in plants, where whole-genome duplication has occurred repeatedly over the past 200 million years. Many plant species even contain mixed populations of individuals with different numbers of chromosome sets, illustrating how common polyploidy remains.

In animals, whole-genome duplication is far rarer. The most recent event in the lineage leading to humans happened roughly 450 million years ago. In budding yeast, the last one occurred about 200 million years ago. This difference helps explain why plant genomes are so much more variable in size than animal genomes. After a duplication event, the extra copies of genes and non-coding regions can be retained, reshuffled, or gradually lost, but the genome rarely returns to its original size.

Population Size and Genetic Drift

A key theory, advanced by biologist Michael Lynch, argues that genome size is largely shaped by two non-adaptive forces: random genetic drift and mutation pressure. In species with large effective population sizes, natural selection is efficient enough to remove unnecessary DNA. In species with small populations, selection is weaker, and mildly harmful or neutral sequences (like extra transposable elements or bloated intergenic regions) accumulate simply because drift overwhelms selection.

This explains a broad pattern: bacteria and other microorganisms with enormous population sizes tend to have small, tightly packed genomes, while multicellular animals and plants with smaller populations tend to carry far more non-coding DNA. The nuclear genomes of multicellular organisms contain large stretches of non-coding sequence precisely because the disadvantage of carrying that extra DNA is too small to be effectively countered by selection.

How Genomes Shrink

Genome size isn’t a one-way ratchet. Several forces actively reduce it. One is deletion bias: in many organisms, especially prokaryotes, small deletions happen more frequently than insertions during DNA replication and repair. Over millions of years, this tips the balance toward smaller genomes.

Population size plays a role here too. Computational modeling shows that when population sizes increase, genomes lose a significant fraction of their non-coding sequences while maintaining their protein-coding content. The result is a densely packed genome, similar to what’s seen in streamlined marine bacteria. A different kind of shrinkage happens when mutation rates rise sharply. Under high mutation pressure, organisms that happen to have shorter genomes experience fewer harmful mutations per generation, giving them a survival edge. This “selection for robustness” can drive dramatic reductions in genome size, sometimes at the cost of losing functional genes.

Environmental stress can also push genomes toward compaction. In soil bacteria studied across gradients of temperature, rainfall, pH, and salinity, stressful environments consistently led to reduced genome content. Populations under harsher conditions shed accessory genes, particularly redundant duplicates, streamlining their genomes as an evolutionary response.

The Smallest and Largest Known Genomes

The range of genome sizes across life is staggering. Among free-living eukaryotes, the smallest known genome belongs to Ostreococcus tauri, a marine alga so tiny it’s the world’s smallest free-living eukaryotic cell. Its nuclear genome is just 12.56 million base pairs, packed with genes and stripped of almost all non-coding space through extreme compaction, including shortened gaps between genes and even fusion of adjacent genes.

At the other extreme, Paris japonica holds the record for the largest known eukaryotic genome at roughly 150 billion base pairs, about 50 times larger than the human genome. The vast majority of that DNA is repetitive sequence. These two organisms are not dramatically different in the number of biological tasks they perform. The difference is almost entirely in how much non-functional or repetitive DNA they carry.

Genome Size Affects the Cell Itself

Genome size isn’t just a bookkeeping detail. It has real physical consequences. There is a strong, well-documented correlation between total DNA content and cell size: more DNA means a larger nucleus, which means a larger cell. This relationship, called the nucleotypic effect, holds across vertebrates, plants, and other groups regardless of how much of that DNA actually codes for anything.

Larger cells divide more slowly. Within a given group of organisms, smaller cells tend to have higher metabolic rates and faster division times. In mammals and birds, DNA content correlates negatively with metabolic rate even after correcting for body size. In amphibians, direct measurements of red blood cell metabolism confirm the same pattern.

This carries consequences for development. In frogs, genome size correlates with the rate of cell proliferation and the overall speed of embryonic development. Species with larger genomes develop more slowly because their cells take longer to replicate all that DNA and divide. So while a bloated genome may not be immediately lethal, it imposes a real cost in terms of how quickly an organism can grow, reproduce, and respond to its environment.