What Is HARKing in Psychology and Why Does It Matter?

HARKing stands for “Hypothesizing After the Results are Known.” It’s the practice of looking at your data first, finding an interesting pattern, and then writing up your research paper as if you predicted that pattern all along. The term was coined by psychologist Norbert Kerr in a 1998 paper, and it has since become one of the most discussed problems in research integrity, particularly in psychology.

How HARKing Works

In the standard scientific method, you start with a hypothesis, design an experiment to test it, collect data, and then report what you found. HARKing flips this sequence. A researcher runs a study, notices an unexpected result in the data, and then rewrites the introduction of their paper to make it look like that unexpected result was the thing they set out to find. The post hoc hypothesis gets dressed up as an a priori hypothesis.

This might sound like a minor literary offense, but the consequences are real. When a hypothesis is crafted to perfectly match a specific dataset, it’s essentially a description of that dataset rather than a genuine prediction about how the world works. The finding looks clean and confirmed on paper, but it’s been reverse-engineered. That makes it far less likely to hold up when another research team tries to replicate the study with new participants.

Why It Inflates False Positives

Statistical tests are designed to work in one direction: you state what you expect to find, then you check whether the data support it. The math behind a p-value assumes you had one specific prediction before looking at the data. When you instead sift through dozens of possible patterns and then test the one that already looks promising, the p-value becomes meaningless. It no longer reflects the true probability of a fluke result.

This is the core problem. HARKing increases the risk that false-positive findings, results that look real but aren’t, make their way into the scientific literature. When hypotheses are uniquely tailored to a given sample, the probability that those findings will generalize to the broader population drops significantly. Other researchers try to replicate the result, fail, and the field is left with a published finding that doesn’t hold up.

HARKing vs. P-Hacking

HARKing often gets mentioned alongside p-hacking, another questionable research practice, but they work in opposite directions. With p-hacking, a researcher has a fixed hypothesis they believe in and manipulates the data analysis until they get a statistically significant result to support it. They might try different statistical tests, remove certain participants, or tweak how they measure the outcome until the numbers cooperate. The hypothesis stays the same; the data get bent to fit it.

With HARKing, the data stay the same and the hypothesis gets bent. The researcher is flexible about what they claim to have predicted, reshaping the story of the paper to match whatever the data happened to show. One approach has been described as “procrustean” (forcing data to fit the hypothesis), the other as “opportunistic” (bending the story to fit the data). In practice, the two often overlap. Extreme forms of p-hacking may require additional post hoc theorizing to justify unusual analysis choices, blurring the line between the practices. Both inflate false-positive rates.

Why Researchers Do It

The incentives in academic science push hard toward positive results. Journals prefer to publish studies that confirm a hypothesis. Funders want to see clear findings. Hiring and promotion committees reward researchers with impressive publication records. This creates a system where admitting “we didn’t find what we expected” feels like career suicide.

The numbers illustrate how skewed the system is. In studies of treatments for depression, 98% of positive antidepressant trials were published compared to only 48% of negative trials. Statistically significant findings in psychiatry receive more than double the citations of nonsignificant studies. In surveys including over 7,000 researchers, publication pressure was the single strongest predictor of engaging in questionable research practices, outweighing factors like career stage or gender.

These pressures produce visible distortions. In a random sample of studies from psychiatry and psychology journals, 96% reported results that supported the hypothesis. Among studies that had preregistered their protocols and hypotheses beforehand, only 44% did. That gap suggests widespread selective reporting, hypothesis switching, or both.

The Connection to the Replication Crisis

Psychology’s replication crisis, the discovery that many landmark findings couldn’t be reproduced by independent teams, brought HARKing into the spotlight. When a field’s published literature is populated with findings tailored to specific datasets rather than genuine predictions, replication failures are inevitable. The original results were never really predictions in the first place; they were descriptions of noise in one particular sample, dressed up to look like discoveries.

This is what makes HARKing especially corrosive. A single HARKed study might seem harmless. But across thousands of papers, the practice fills journals with findings that look robust on paper but crumble under scrutiny. Other researchers build theories on top of those findings, design interventions based on them, and cite them as established facts. The downstream costs to the field, and to anyone relying on psychological research for real-world decisions, compound over time.

Preregistration as a Safeguard

The primary tool developed to combat HARKing is preregistration: researchers publicly record their hypotheses, methods, and analysis plans before collecting data. This creates a timestamp. If the final paper matches the preregistered plan, readers can trust that the hypotheses were genuine predictions. If the paper deviates, those deviations are transparent.

A more robust version is the registered report, where a journal reviews and accepts a study based on its design and hypotheses before any data are collected. This removes the temptation to HARK entirely, because the paper’s fate doesn’t depend on whether the results come out positive or negative. Early evidence is encouraging: preregistered studies show a higher rate of null findings being reported, which suggests less selective storytelling.

That said, preregistration isn’t bulletproof. Evidence indicates that authors sometimes deviate from their registered plans without fully disclosing it. There is currently no reliable method for reviewers or journals to detect when a researcher has quietly swapped a preregistered hypothesis for a post hoc one. The safeguard works best when the research community treats it as a cultural norm rather than a box to check.

Why It Matters Beyond Academia

HARKing isn’t just an abstract problem for statisticians. Psychological research informs therapy guidelines, educational policies, workplace practices, and public health campaigns. When a published finding is the product of HARKing, anyone who acts on that finding is making decisions based on something that may not be real. A therapy technique that “worked” in a HARKed study might offer no benefit to actual patients. A parenting strategy promoted in popular media might be built on a statistical artifact.

Understanding HARKing helps you read research more critically. Studies with preregistered hypotheses deserve more confidence than those without. Results that have been independently replicated carry more weight than single dramatic findings. And a paper where every prediction is perfectly confirmed, with no surprises or null results, might be too clean to be true.