A research hypothesis is a specific, testable prediction about the relationship between variables that motivates a study. In statistics, that prediction gets translated into a pair of formal statements, the null hypothesis and the alternative hypothesis, so it can be evaluated with data. Understanding how this translation works is the key to understanding hypothesis testing itself.
From Research Question to Statistical Hypothesis
Every statistical test begins with a conjecture, something a researcher suspects is true. Maybe a new drug lowers blood pressure more than an existing one, or students who sleep eight hours perform better on exams. That conjecture is the research hypothesis. It’s written in plain language and reflects what the researcher actually wants to show.
The problem is that a plain-language prediction isn’t precise enough for math. So the research hypothesis gets restated as a pair of competing statistical hypotheses that can be evaluated with data. The null hypothesis (written H₀) represents the default position: nothing is happening, there’s no difference, there’s no effect. The alternative hypothesis (written Hₐ) represents what the researcher suspects is true. These two statements are always contradictory. You’re essentially putting the “no effect” claim on trial and seeing whether the evidence is strong enough to reject it.
Here’s a concrete example. Suppose your research hypothesis is “caffeine improves reaction time.” The null hypothesis would be: the average reaction time with caffeine equals the average reaction time without it (H₀: μ₁ = μ₂). The alternative hypothesis would be: the average reaction time with caffeine is lower (Hₐ: μ₁ < μ₂). The entire test revolves around whether your data provide enough evidence to reject H₀ in favor of Hₐ.
The Null Hypothesis Always Contains Equality
One detail trips people up early: H₀ always includes an equals sign. It uses =, ≥, or ≤. The alternative hypothesis never includes equality. It uses ≠, >, or <. This isn’t arbitrary. The null hypothesis defines a specific baseline value so that a test statistic can measure how far your observed data fall from it. Without that anchor point, there’s nothing to test against.
When you finish the test, you arrive at one of two decisions: reject H₀ or fail to reject H₀. You never “accept” the null hypothesis. Failing to reject it simply means your data weren’t strong enough to rule it out. And rejecting it means the evidence favors your alternative hypothesis beyond a pre-set threshold of doubt.
One-Tailed vs. Two-Tailed Tests
The direction of your alternative hypothesis determines whether you run a one-tailed or two-tailed test. If your research hypothesis predicts a specific direction (“drug A is better than drug B”), you use a one-tailed test. If you’re simply predicting a difference in either direction (“the two drugs differ”), you use a two-tailed test. A two-tailed test checks for the possibility of an effect in both directions, while a one-tailed test concentrates all its statistical sensitivity in one direction.
This choice matters because it affects how easily you reach statistical significance. A one-tailed test is more sensitive in the predicted direction, which makes it tempting to use. But choosing a one-tailed test purely to make your results significant is considered inappropriate. So is switching from a two-tailed test to a one-tailed test after seeing your results. The direction should be justified by your research question before you collect data.
P-Values and the Significance Threshold
Once you compute a test statistic from your data, you get a p-value. The p-value tells you how likely it would be to see results at least as extreme as yours if the null hypothesis were actually true. A small p-value means your data are hard to explain under the assumption of “no effect,” which pushes you toward rejecting H₀.
The threshold for “small enough” is called the alpha level, and most research sets it at 0.05 (a 5% chance). If your p-value falls below alpha, the result is considered statistically significant. Some fields use stricter thresholds: a p-value below 0.01 is often described as highly significant. Ronald Fisher originally described the 0.05 level as a 1-in-20 chance, and it became the standard convention across most sciences.
That said, the American Statistical Association released a formal statement cautioning against treating any single threshold as a bright line. Among the key principles: a p-value does not measure the probability that your hypothesis is true, it does not tell you how large or important an effect is, and scientific conclusions should not rest on whether a p-value crosses 0.05 alone. Effect sizes and confidence intervals should accompany p-values to give a fuller picture of what the data actually show.
Type I and Type II Errors
Two kinds of mistakes are possible in hypothesis testing. A Type I error (false positive) happens when you reject a null hypothesis that is actually true. You conclude there’s an effect when there isn’t one. The probability of this error is your alpha level, so setting alpha at 0.05 means you accept a 5% chance of a false positive. Even at that threshold, the real-world probability of incorrectly rejecting a true null hypothesis can be at least 23%.
A Type II error (false negative) happens when you fail to reject a null hypothesis that is actually false. You miss a real effect. The probability of this error is called beta. Researchers try to minimize beta by designing studies with adequate statistical power.
Statistical Power and Sample Size
Power is the probability of correctly detecting a real effect when one exists. If a new treatment genuinely works, power tells you how likely your study is to pick that up. Researchers generally aim for at least 80% power, meaning the study has an 80% chance of finding a true effect.
Three factors drive power. First, sample size: larger samples give greater power because they produce more precise estimates. Second, effect size: bigger real-world differences are easier to detect. Third, the alpha level: a stricter threshold (say, 0.01 instead of 0.05) makes it harder to reach significance, which reduces power unless you compensate with a larger sample. One important caveat is that a very large sample can make tiny, practically meaningless differences show up as statistically significant. Statistical significance and real-world importance are not the same thing.
The Five-Step Testing Procedure
Most introductory statistics courses teach hypothesis testing as a five-step process:
- Check assumptions and write hypotheses. Verify that your data meet the requirements for the test you’re using (sample size, distribution shape, independence) and formally state H₀ and Hₐ.
- Compute the test statistic. This single number summarizes how far your sample data fall from what the null hypothesis predicts.
- Determine the p-value. Using the test statistic, calculate the probability of observing data this extreme under H₀.
- Make a decision. Compare the p-value to your alpha level. If p < alpha, reject H₀. If not, fail to reject H₀.
- State a real-world conclusion. Translate the statistical decision back into plain language that addresses the original research question.
That last step closes the loop. You started with a research hypothesis in everyday language, converted it into statistical notation, ran the numbers, and now you translate the verdict back into a statement about the real world.
What Makes a Hypothesis Testable
Not every prediction qualifies as a good research hypothesis. To hold up in a statistical framework, a hypothesis needs several qualities. It should be grounded in existing evidence, ideally based on a thorough review of published literature rather than a random guess. It must be testable with available methods and data. It should be specific enough to generate clear predictions, and it needs to be falsifiable, meaning there must be a possible outcome that would prove it wrong. A hypothesis that can explain any result explains nothing. Finally, the study needed to test it must be ethically acceptable, since no statistical technique can rescue a hypothesis that requires harmful or deceptive research to evaluate.

