What Is Genome Size and How Does It Affect Organisms?

Genome size is the total amount of DNA contained in one copy of an organism’s complete set of genetic instructions. In most contexts, this means the DNA in a single haploid cell (like a sperm or egg cell), which carries one copy of every chromosome. Scientists measure genome size either in base pairs, the individual “letters” of the DNA code, or in picograms, a unit of mass. One picogram equals roughly 1 billion base pairs.

The concept sounds straightforward, but genome size turns out to be one of the more surprising numbers in biology. It varies enormously across life, it doesn’t track neatly with how complex an organism appears, and it has real consequences for how cells function.

How Genome Size Is Measured

The most widely used laboratory method is flow cytometry. Researchers stain cell nuclei with a dye that slots into the grooves of the DNA helix, then pass thousands of those nuclei through a laser beam. The more DNA a nucleus contains, the brighter it glows. By running an unknown sample alongside nuclei from a species whose genome size is already established, scientists can calculate the unknown genome size from the difference in fluorescence. The technique is fast, works on a huge variety of organisms, and processes thousands of nuclei per run.

For species with sequenced genomes, computational approaches offer a second route. Software can analyze raw sequencing data and estimate total genome size from the frequency patterns of short DNA sequences. This method is especially useful for organisms that are difficult to grow in a lab or that produce limited tissue for staining.

Genome Size Across the Tree of Life

The range is staggering. At the small end, the bacterium Mycoplasma genitalium carries a circular chromosome of just 580,070 base pairs, making it the smallest genome of any free-living organism. At the large end, a New Caledonian fork fern called Tmesipteris oblanceolata holds the current record at roughly 160 billion base pairs, more than 50 times the size of the human genome.

Viruses sit below bacteria, typically carrying genomes of only a few thousand to a few hundred thousand base pairs. Bacteria and archaea generally range from about 500,000 base pairs up to around 13 million in the largest known cases, such as the soil bacterium Sorangium cellulosum. Eukaryotes (organisms with complex cells) span an enormous range. The smallest known eukaryotic genome belongs to a microsporidian fungus at about 2.9 million base pairs, while plants and some amphibians can reach tens or hundreds of billions.

For reference, the human genome contains about 3.055 billion base pairs, as confirmed by the Telomere-to-Telomere Consortium, which completed the first truly gapless human genome sequence in 2022. That project filled in roughly 200 million base pairs that previous efforts had missed, including nearly 2,000 newly predicted genes. The roundworm C. elegans, a workhorse of genetics research, has 97 million base pairs. The pufferfish Takifugu rubripes has about 400 million, one of the smallest genomes among vertebrates.

Why Bigger Doesn’t Mean More Complex

You might expect that more complex organisms would need more DNA, but that’s not what the data show. This mismatch is known as the C-value paradox. The “C-value” is simply the amount of DNA in a haploid cell, and the paradox is that it bears little relationship to how many cell types an organism has, how sophisticated its behavior is, or how many protein-coding genes it carries.

A lungfish genome, for example, dwarfs the human genome. Many single-celled amoebas have genomes hundreds of times larger than ours. The fork fern that holds the size record is a modest plant with no flowers, no seeds, and a simple body plan. Meanwhile, the pufferfish manages vertebrate-level complexity with a genome roughly one-eighth the size of a human’s.

The resolution to this paradox lies in what the extra DNA actually is. Most of it does not code for proteins. In large genomes, the bulk of the sequence consists of repetitive elements, particularly transposable elements, sometimes called “jumping genes.” These are stretches of DNA that can copy themselves and insert new copies elsewhere in the genome. In plants, the relationship is especially clear: transposable elements make up as little as 3% of small plant genomes and as much as 85% of large ones. Genome size in plants is essentially a linear function of how many transposable elements have accumulated.

Earlier explanations dismissed this non-coding DNA as “junk” or “selfish” DNA that persisted simply because it wasn’t harmful enough to be eliminated. That view has largely fallen out of favor. A significant proportion of non-coding DNA plays regulatory roles, controlling when, where, and how intensely genes are switched on. The C-value paradox, in other words, arose partly from the assumption that only protein-coding genes “count.” Once you recognize the regulatory importance of non-coding sequences, the disconnect between genome size and complexity becomes less mysterious, though the sheer volume of repetitive DNA in some genomes still lacks a complete explanation.

How Genome Size Affects Cells and Organisms

Genome size is not just a bookkeeping number. It has measurable effects on cell biology. One of the most consistent patterns across species is that cells with larger genomes tend to be physically larger. More DNA takes up more space in the nucleus, and cells scale up to accommodate it. This in turn affects how quickly cells can divide, how fast an organism develops, and how efficiently it uses energy.

Research on a single-celled green alga, Dunaliella tertiolecta, illustrates the tradeoffs. Across 72 independent lineages of this species, those with relatively smaller genomes had higher fitness: they grew faster and reached larger total populations. Paradoxically, they also had lower energy fluxes than lineages with larger genomes. Over 100 generations in the lab, lineages with the largest genomes shrank by about 11%, suggesting active selection pressure to trim excess DNA. Lineages with already-small genomes didn’t shrink further, implying a lower limit below which essential functions would be compromised.

These findings point to a tug-of-war. Natural selection generally favors leaner genomes because they allow faster cell division and more efficient resource use. But genomes can’t shrink below the size needed to carry out all necessary functions, and mechanisms like transposable element expansion keep pushing size upward.

What Drives Genome Size Changes Over Time

Two major forces shape genome size across evolutionary time: the accumulation of transposable elements and whole-genome duplication.

Transposable elements are the primary driver of gradual genome expansion. When these sequences copy and reinsert themselves, the genome grows. In plants, bursts of transposable element activity can dramatically increase genome size over relatively short evolutionary periods. Removal happens too, through processes like unequal recombination that delete chunks of repetitive DNA, but expansion often outpaces deletion.

Whole-genome duplication, or polyploidy, is a more dramatic event. An organism ends up with two (or more) complete copies of every chromosome, instantly doubling its genome size. This is especially common in plants. What happens afterward is telling. In a large-scale analysis of over 3,000 flowering plant species, genome downsizing turned out to be the most common response to polyploidy. The duplicated genome gradually sheds redundant sequences, trending back toward a more streamlined size. Studies in tobacco species (Nicotiana) show that younger polyploids tend to lose DNA through small deletions and recombination within repetitive elements. In older polyploids, roughly 4.5 million years post-duplication, some lineages showed the opposite pattern: genome upsizing driven by the amplification of new repetitive sequences that replaced the original duplicated material.

Bacteria and archaea evolve genome size much more slowly. Estimates suggest bacterial genomes have increased only about 2.5-fold per billion years, and archaeal genomes about 1.9-fold, based on the largest known representatives in each group. The compact, streamlined nature of prokaryotic genomes reflects intense selective pressure to replicate quickly, with little tolerance for excess DNA.

The Human Genome in Context

At 3.055 billion base pairs, the human genome is middling by eukaryotic standards. It is about 50 times larger than the fugu pufferfish genome but 50 times smaller than the fork fern’s. Only about 1.5% of human DNA codes for proteins, with the rest consisting of regulatory sequences, transposable element remnants, structural DNA, and sequences whose functions are still being catalogued. Mammals as a group cluster tightly around 3.2 billion base pairs, suggesting that mammalian genome size has been relatively stable over evolutionary time compared to the wild variation seen in plants and amphibians.