How to Critique a Research Paper From Start to Finish

Critiquing a research paper means systematically evaluating whether the study’s design, execution, and conclusions hold up to scrutiny. It’s not about finding fault for its own sake. The goal is to determine how much you can trust the findings and whether they actually mean what the authors claim. Whether you’re a student, a professional reviewing literature, or someone trying to make sense of a study you found online, the process follows the same core steps.

Start With the Research Question

Before diving into methods or data, get clear on what the study is trying to answer. Read the title, abstract, and introduction with one question in mind: is the purpose of this study stated clearly enough that you could explain it to someone in a sentence or two? A well-written paper makes its research question obvious early on, and everything that follows should connect back to it.

Check whether the abstract accurately represents what the paper actually contains. Abstracts sometimes overstate findings or omit key limitations. Think of the abstract as a promise and the full paper as the delivery. If they don’t match, that’s your first red flag.

Evaluate the Study Design

The study design is the backbone of any paper, and it needs to fit the research question. A study asking whether a treatment works should ideally use a controlled trial with random assignment to groups. A study exploring people’s experiences with a disease might use interviews or surveys. The design isn’t inherently good or bad. What matters is whether it’s appropriate for what the researchers are trying to learn.

Look for specific details about how participants were selected, how many were included, and whether anyone dropped out along the way. Small sample sizes are one of the most common weaknesses in published research. A study with too few participants may simply lack the statistical power to detect a real effect, even if one exists. The ideal power of a study is 0.8 (80%), meaning there’s an 80% chance of detecting a true effect. When researchers don’t report a power analysis, or when the sample is obviously small relative to the question being asked, treat the findings with extra caution.

Also consider who was included and who was excluded. Studies that restrict participants to narrow demographics, exclude people with other health conditions, or run for only a short period have limited generalizability. A blood pressure drug tested only in middle-aged men without any other health problems tells you little about how it performs in the broader population.

Check for Sources of Bias

Bias is systematic error that skews results in one direction. It can creep in at every stage of a study, and spotting it is one of the most important skills in critiquing research.

Selection bias: Were participants chosen or assigned to groups in a way that could tilt the results? Proper randomization is the main defense here. If the paper describes randomization, look for details on how it was done.
Performance bias: Did participants or researchers know who was getting the treatment and who wasn’t? Lack of blinding can unconsciously influence behavior and outcomes on both sides.
Detection bias: Were outcomes measured the same way in all groups? If the person assessing results knew which group a participant belonged to, their expectations could color the measurement.
Attrition bias: Did a significant number of participants drop out, and were dropouts evenly distributed across groups? If more people leave the treatment group because of side effects, the remaining participants may look healthier than they really are.

Researchers and trial sponsors may also be reluctant to publish unfavorable results, a phenomenon known as citation or publication bias. Check the funding source and conflict of interest disclosures, usually found in the footnotes or at the end of the paper. Industry-funded studies aren’t automatically untrustworthy, but knowing who paid for the research helps you evaluate whether the conclusions might be influenced by financial interests.

Scrutinize the Statistical Analysis

You don’t need to be a statistician to evaluate this section meaningfully. Start with the basics: did the researchers use statistical methods that match their study design and data type? A paper should explain why specific tests were chosen, not just name them.

Pay close attention to p-values, but don’t stop there. A p-value below 0.05 means there’s less than a 5% probability that the observed difference between groups happened by chance alone. That sounds impressive, but p-values depend heavily on sample size. With a large enough sample, even a trivially small difference can reach statistical significance. As statistician Jacob Cohen put it, the primary product of research should be measures of effect size, not p-values.

Effect size tells you how large the difference actually is. Cohen’s classification is a useful reference: 0.2 is a small effect, 0.5 is medium, and 0.8 or above is large. A medium effect is one you’d notice through careful observation. A small effect is real but subtle. When a paper reports statistical significance without any measure of effect size, you’re only getting half the picture.

Statistical Significance vs. Practical Significance

This distinction trips up even experienced readers. A result can be statistically significant (unlikely to be due to chance) while being clinically or practically meaningless. Imagine a study finds that a new pain medication reduces pain scores by 0.3 points on a 10-point scale compared to a placebo, with a p-value of 0.01. Statistically, that’s a real difference. Practically, a patient would never notice it.

Clinical significance asks a different question: does this difference actually improve someone’s quality of life, function, or outcome in a way that matters? Two studies can have identical statistical significance but wildly different clinical relevance. When you’re reading a paper, always ask yourself whether the size of the reported effect would make a meaningful difference in the real world. Results that are clinically important sometimes fail to reach statistical significance, particularly in underpowered studies, and results that clear the statistical bar can fall flat in practice.

Assess Whether Conclusions Match the Data

The discussion and conclusion sections are where authors interpret their findings, and where overreach is most common. The hypothesis, methods, data, and conclusions should form a tight chain. The methods test the hypothesis, the data come from the methods, and the conclusions should be directly supported by the data. Nothing more, nothing less.

Watch for these common problems:

Inflated importance: Authors sometimes frame modest findings as groundbreaking. Compare the actual numbers in the results section to the language used in the discussion.
Causal claims from correlational data: Observational studies can identify associations but cannot prove that one thing causes another. If a study only observed people (rather than randomly assigning them to conditions), conclusions like “X causes Y” are overstepping.
Ignoring limitations: Every study has weaknesses. A good paper acknowledges them honestly. Be skeptical of papers that gloss over obvious flaws like small samples, high dropout rates, or short follow-up periods.
Selective reporting: Check whether the authors address all the outcomes they set out to measure, including the ones that didn’t reach significance. Reporting only the favorable results while burying the rest is a form of bias.

Use Reporting Checklists as a Reference

Standardized reporting guidelines exist for nearly every type of study, and they’re a practical tool for critique. The EQUATOR Network maintains the major ones: CONSORT for randomized trials, STROBE for observational studies, PRISMA for systematic reviews, STARD for diagnostic studies, and COREQ for qualitative research, among others. Each checklist specifies what information a well-reported study of that type should include.

You don’t need to memorize these, but pulling up the relevant checklist while reading a paper gives you a concrete framework. If a randomized trial doesn’t describe its randomization method, allocation concealment, or blinding procedures, those are gaps that the CONSORT checklist would flag immediately. Missing items don’t automatically invalidate a study, but they make it harder to assess its quality.

Read the Results Tables Yourself

Don’t rely solely on the authors’ narrative description of results. Look at the tables and figures directly. Are the results presented clearly? Can you identify the main outcomes, the comparison groups, the sample sizes, and the measures of variability (like confidence intervals or standard deviations)? Sometimes the text emphasizes a favorable secondary outcome while the primary outcome, buried in a table, shows no significant difference.

Confidence intervals are particularly informative. They give you a range within which the true effect likely falls. A narrow confidence interval suggests precision; a wide one suggests uncertainty. If the confidence interval for a treatment effect crosses zero (for a difference) or crosses one (for a ratio), the result is not statistically significant regardless of what the narrative says.

Putting It All Together

A good critique doesn’t just list problems. It weighs the overall quality of the evidence. Ask yourself: given the study’s design, sample, methods, and analysis, how confident am I that the findings reflect reality? A single study with a moderate flaw might still contribute useful evidence. A study with multiple uncorrected biases, a tiny sample, no effect size reporting, and conclusions that outstrip the data deserves much less weight.

Reading research critically is a skill that improves with practice. The more papers you work through systematically, the faster you’ll spot the patterns that distinguish rigorous work from weak evidence dressed up in scientific language.