Why Did COVID-19 Start? What Scientists Know

COVID-19 emerged in late 2019 when a coronavirus jumped from animals to humans, most likely in or near the city of Wuhan, China. On December 31, 2019, Chinese health authorities reported a cluster of 27 pneumonia cases of unknown cause, nearly all linked to the Huanan Seafood Wholesale Market. More than five years later, the exact chain of events that sparked the pandemic remains unresolved, with two competing hypotheses still on the table: a natural spillover from wildlife and a laboratory-associated incident.

What Happened in Wuhan in Late 2019

The first recognized cases appeared as a mysterious cluster of severe pneumonia in Wuhan, a city of 11 million people in central China. Investigators quickly noticed that most early patients had visited or worked at the Huanan Seafood Wholesale Market, a large indoor market that sold not just seafood but also live and butchered wild animals. The market was shut down on January 1, 2020, and teams collected 923 environmental samples from surfaces, drains, and stalls.

Of those samples, 74 tested positive for SARS-CoV-2. The positive samples clustered in the southwestern corner of the market’s west zone, where 8 of the 10 stalls selling live wildlife were located. Sale records show that in late December 2019, vendors were selling snakes, bamboo rats, hedgehogs, porcupines, badgers, sika deer, various birds, and crocodiles. Raccoon dogs, a fox-like animal known to be susceptible to SARS-like coronaviruses, were also confirmed to have been present at the market before its closure.

Critically, none of the 457 animal samples collected after the market closed tested positive for the virus. The animals had already been removed or disposed of by the time investigators arrived, leaving a gap in the evidence that has never been filled. A large seroprevalence study of nearly 44,000 blood samples from Wuhan donors collected between September and December 2019 found no antibodies against SARS-CoV-2 before 2020, suggesting the virus was not circulating widely in the city before December.

How the Virus Relates to Bat Coronaviruses

SARS-CoV-2 belongs to a family of coronaviruses found in horseshoe bats across Southeast Asia. Its closest known wild relative is a bat virus called RaTG13, sampled from a cave in China’s Yunnan province in 2013. The two viruses share about 96% of their genetic code. That sounds like a near match, but the remaining 4% represents decades of evolutionary distance, meaning RaTG13 is not the direct ancestor of SARS-CoV-2. The actual precursor virus has never been found.

Other closely related coronaviruses have since been discovered in bats in Laos, Japan, Cambodia, and Thailand, painting a picture of a broad family of viruses circulating across the region. No bats were sold at the Huanan market, though, which means if the virus did originate in bats, it needed a stepping stone: an intermediate animal host that caught the virus, allowed it to adapt, and then passed it to people.

The Search for an Intermediate Host

During the original SARS outbreak in 2003, the virus traveled from bats to civets (small cat-like mammals sold in Chinese markets) and then to humans. Researchers expected a similar bridge animal for SARS-CoV-2, and early attention focused on pangolins, the world’s most trafficked mammals. Coronaviruses found in smuggled Malayan pangolins in southern China showed a striking similarity to SARS-CoV-2 in the part of the spike protein that latches onto human cells. The receptor the virus uses to enter cells is actually more similar between humans and pangolins (about 85% match) than between humans and bats (about 81%).

However, no pangolins were documented at the Huanan market. Raccoon dogs were present and are known to be susceptible to SARS-like viruses. One analysis proposed that the virus spilled from animals into humans at least twice in November or December 2019, with raccoon dogs suggested as the intermediate host. But no raccoon dog at the market was ever confirmed to carry the virus, leaving this hypothesis supported by circumstantial and genetic evidence rather than direct proof.

Another complication is a feature unique to SARS-CoV-2 among its close relatives: a small insertion in the spike protein called a furin cleavage site, which helps the virus enter human cells efficiently. None of the pangolin coronaviruses identified so far carry this feature. Some researchers flagged this as suspicious, but detailed analysis published in the Proceedings of the National Academy of Sciences concluded that only four extra amino acids were inserted, that most of the surrounding sequence already exists in related bat viruses, and that the insertion’s genetic structure would be an unnecessarily complex feat of engineering. The authors argued the feature is consistent with natural evolution, though the debate has not fully closed.

The Lab Leak Hypothesis

Wuhan is also home to the Wuhan Institute of Virology, founded in 1956 and host to China’s first biosafety level 4 laboratory, which opened in 2015. The institute had been conducting extensive research on bat coronaviruses for years, and a team led by virologist Shi Zhengli was the first to identify and sequence the new virus in February 2020. The coincidence of a major coronavirus research center being located in the same city where the pandemic began fueled questions about whether the virus could have escaped from a lab through an accident, such as a researcher becoming infected while handling samples or conducting experiments.

Proponents of this hypothesis point to the lack of a confirmed intermediate host, the unique furin cleavage site, and what they describe as insufficient transparency from Chinese authorities. Critics note that Wuhan’s large population, its position as a major transport hub, and its wildlife trade made it a plausible site for natural spillover regardless of the institute’s presence. The virus has never been matched to any sample in the institute’s known collection.

Where Investigations Stand

The U.S. Intelligence Community released an assessment stating that all agencies consider both hypotheses plausible. Four intelligence agencies and the National Intelligence Council assessed with low confidence that the virus most likely came from natural animal exposure. One agency assessed with moderate confidence that a laboratory-associated incident was the most likely cause. Three agencies could not settle on either explanation.

The World Health Organization’s Scientific Advisory Group for the Origins of Novel Pathogens (SAGO) has continued reviewing evidence through mid-2025, including published research, intelligence reports, and expert discussions. Their key recommendation has been consistent: all available data, particularly from early cases and the market, needs to be shared openly to resolve the question. China has not granted access to key raw data, early patient records, or detailed laboratory logs from the Wuhan Institute, which both hypotheses’ supporters agree would be essential to reaching a definitive answer.

What is well established is the timeline. The virus began spreading among humans in Wuhan no earlier than December 2019, based on blood bank screening of tens of thousands of samples. It spread rapidly, with early estimates of each infected person passing the virus to roughly two to five others. By the time the market was identified and closed, the virus was already moving through the city, setting the stage for a pandemic that would eventually reach every country on Earth.