P-hacking refers to any practice that manipulates data collection, analysis, or reporting to push a result below the p < 0.05 threshold commonly used to declare statistical significance. Common examples include selectively reporting only the outcomes that “worked,” stopping data collection the moment a significant result appears, adding or removing control variables until the numbers cooperate, testing endless subgroups after your main hypothesis fails, and reporting only a subset of experimental conditions. If you encountered this as a multiple-choice question, virtually any option describing these flexible, after-the-fact analytical choices is a correct example of p-hacking.
What P-Hacking Actually Means
At its core, p-hacking involves inflating effect sizes through how data are collected or analyzed. It doesn’t require deliberate fraud. Researchers may not even realize they’re doing it. The key feature is that analytical decisions are made after looking at the data, with the goal (conscious or not) of finding something statistically significant. The significant result is then presented as if it were the plan all along.
A landmark paper in Psychological Science by Simmons, Nelson, and Simonsohn identified four especially common “researcher degrees of freedom” that drive false positives: flexibility in choosing among dependent variables, flexibility in choosing sample size, selectively using covariates, and reporting only subsets of experimental conditions. Each of these, alone or in combination, can make a meaningless result look real.
Selective Outcome Reporting
Imagine a study measures ten different outcomes but only reports the two that reached significance. That’s p-hacking through selective reporting. The other eight results quietly disappear. Because running more statistical tests increases the odds that at least one will cross the p < 0.05 line by pure chance, cherry-picking the “winners” dramatically inflates the false positive rate. This applies whether you’re choosing among different questionnaire scores, different time points, or different biological markers.
Optional Stopping
Optional stopping means peeking at your data as it comes in and deciding whether to keep collecting based on what you see. If the result is significant at 30 participants, you stop. If not, you run 10 more and check again. This practice inflates type I error rates (false positives) beyond their nominal values because each peek is essentially another chance to hit significance by luck. The correct approach is to set your sample size before the study begins and commit to it.
Adding or Removing Control Variables
In regression analysis, including a control variable can shift your main result from non-significant to significant, or vice versa. P-hacking through covariate selection means trying different combinations of control variables until one combination produces p < 0.05. Research comparing doctoral dissertations to the journal articles that came from them found that some authors added new control variables in the published version that weren’t in the original analysis, turning previously non-significant findings into significant ones. The published hypothesis might read “X relates to Y when Z is controlled for,” even though that condition was discovered after the fact.
A simulation study in Royal Society Open Science showed how this works in practice: starting with no covariates, then adding them one at a time, then combining them in different orders. With just three covariates, this strategy produces five additional statistical tests beyond the original, each an extra opportunity to land below 0.05.
Post-Hoc Subgroup Analysis
When a study’s main result is disappointing, researchers sometimes slice the data by age, sex, education, smoking status, or other characteristics to see if the treatment worked in some specific group. This is sometimes called a “fishing expedition,” and it’s based more on hope than on hypotheses. The problem isn’t that subgroup analysis is inherently wrong. It’s that testing many subgroups without adjusting for the number of comparisons makes false positives nearly inevitable, and presenting a post-hoc finding as though it were predicted in advance is misleading.
Planned subgroup analyses, declared before data collection, are a different matter entirely. The distinction between exploratory and confirmatory analysis is what separates legitimate investigation from p-hacking.
Reporting Only Some Experimental Conditions
Studies often include multiple experimental groups or conditions. If a researcher runs a study with three treatment groups and a control, but only reports the one comparison that reached significance, they’ve effectively hidden the failed tests. This inflates the apparent strength of the finding because the reader never learns about the comparisons that didn’t pan out.
Why It Matters
P-hacking doesn’t just bend the rules of statistics. It pollutes the scientific literature with findings that can’t be replicated. When a result is significant only because of analytical flexibility, other labs will fail to reproduce it, wasting time and funding. In medical research, it can lead to treatments being adopted based on inflated evidence. The American Statistical Association issued a formal statement emphasizing that a p-value near 0.05, taken by itself, offers only weak evidence against the null hypothesis, and that any effect, no matter how tiny, can produce a small p-value if the sample size is large enough. Context matters far more than crossing an arbitrary threshold.
How Pre-Registration Prevents P-Hacking
The most effective safeguard is pre-registration: publicly recording your hypotheses, sample size, outcome measures, and analysis plan before collecting data. This locks in your decisions so you can’t retroactively adjust them based on results. A stronger version, called a registered report, goes further. Journals review and provisionally accept the study design before any data exist, removing the temptation to massage results into significance. Both approaches make the line between exploratory and confirmatory analysis explicit, which is exactly the distinction that p-hacking erases.
Exploratory analysis remains valuable for generating new ideas. The problem arises only when exploratory findings are dressed up as confirmatory ones, presented with a p-value that implies a rigorous, pre-planned test when the reality was a search through many possible analyses until something stuck.

