Statistical procedures are the standardized methods used to collect, organize, analyze, and draw conclusions from data. They fall into two broad categories: descriptive statistics, which summarize what your data looks like, and inferential statistics, which use that data to test ideas and make predictions about a larger population. Every field that relies on evidence, from medicine to marketing to social science, uses these procedures to move from raw numbers to meaningful answers.
Descriptive Statistics: Summarizing Your Data
Descriptive procedures do exactly what they sound like. They describe a dataset so you can understand its basic shape and features without getting lost in individual data points. These procedures answer simple but essential questions: What’s the typical value? How spread out are the numbers?
Three measures capture the center of your data. The mean is the arithmetic average. The median is the middle value when all observations are lined up in order. The mode is the most frequently occurring value. Each tells you something slightly different. In a dataset where a few extreme values skew the numbers (like household income in a city), the median often gives a more accurate picture of what’s “typical” than the mean.
Three measures capture how spread out your data is. The range is simply the gap between the highest and lowest values. The standard deviation tells you how far, on average, individual data points sit from the mean. The interquartile range shows the spread of the middle 50% of your data and pairs naturally with the median, just as the standard deviation typically pairs with the mean. Together, these six measures, three for center and three for spread, form the backbone of descriptive statistics.
Inferential Statistics: Drawing Conclusions
Where descriptive statistics tell you what happened in your sample, inferential statistics help you figure out whether those findings apply to a bigger population. If you survey 500 people and find that Group A scored higher than Group B on a test, inferential procedures help you determine whether that difference is real or just a fluke of the particular 500 people you happened to sample.
Inferential procedures generally serve one of four goals:
- Comparison: Assessing whether groups differ from each other
- Correlation: Measuring whether two variables move together
- Regression: Predicting one variable from another
- Agreement: Checking whether different measurements of the same thing line up
The specific test you use depends on the type of data you have and how many groups you’re comparing. For instance, when comparing two groups on a measurement like blood pressure, you’d use a t-test if the data follows a bell-shaped curve, or a Mann-Whitney U test if it doesn’t. When comparing three or more groups, the t-test gives way to analysis of variance (ANOVA) for bell-shaped data, or the Kruskal-Wallis test for data that isn’t normally distributed. For categorical data, like comparing the proportion of men versus women who chose option A, you’d use a chi-square test.
How Hypothesis Testing Works
Most inferential procedures follow a five-step process. Understanding these steps makes the entire framework click into place.
First, you state your hypotheses. The null hypothesis is the default assumption that nothing interesting is going on: there’s no difference between groups, no relationship between variables. The alternative hypothesis is what you’re actually trying to show, that a real difference or relationship exists. Second, you compute a test statistic, a single number that captures how far your observed data falls from what the null hypothesis would predict. Third, you calculate the p-value, which represents the probability of seeing results at least as extreme as yours if the null hypothesis were actually true.
Fourth, you make a decision. The standard threshold, called the alpha level, is 0.05. If your p-value falls at or below 0.05, you reject the null hypothesis. This translates to accepting no more than a 5% chance that your result happened purely by random variation. Stricter thresholds exist: 0.01 (1% chance) and 0.001 (0.1% chance) are common in fields that demand higher certainty. A more lenient threshold of 0.10 is sometimes used in exploratory research. Fifth and finally, you translate the statistical result back into a real-world conclusion that addresses your original question in plain terms.
Why the P-value Isn’t the Whole Story
A common mistake is treating a statistically significant result as automatically important. A p-value tells you whether a difference likely exists, but it says nothing about how large or meaningful that difference is. With a big enough sample, even a trivially small difference can produce a p-value below 0.05.
This is where effect size comes in. Effect size measures the magnitude of a finding, not just whether it’s statistically detectable. The American Psychological Association emphasizes reporting effect sizes alongside p-values, calling them an essential complement to significance testing. Stating only that “significant differences were found” without quantifying their size is considered poor scientific practice. If a new teaching method improves test scores by half a point on a 100-point scale, that result might be statistically significant with enough students but practically meaningless. The effect size makes that distinction clear.
Parametric vs. Non-parametric Procedures
Inferential tests split into two families: parametric and non-parametric. The choice between them rests on one key question: does your data follow a roughly bell-shaped (normal) distribution?
Parametric tests assume it does. They work with means and standard deviations and tend to have more statistical power, meaning they’re better at detecting real differences when those differences exist. The t-test, ANOVA, and linear regression are all parametric procedures. Non-parametric tests make no assumptions about the shape of your data’s distribution. They work with medians and ranks instead. The Mann-Whitney U test, Kruskal-Wallis test, and Friedman’s test are non-parametric alternatives to the t-test, ANOVA, and repeated-measures ANOVA, respectively.
The practical rule: use parametric tests when your continuous data is approximately normally distributed, and non-parametric tests when it’s not. Categorical data (like yes/no outcomes or group labels) always calls for non-parametric approaches such as the chi-square test.
Choosing the Right Test for Your Data
Picking the correct statistical procedure comes down to three questions: What type of data do you have? How many groups are you comparing? And what’s your research goal?
If you’re comparing two independent groups on a continuous, normally distributed variable, the unpaired t-test is the standard choice. If those same two groups were measured at two different time points (making the observations paired), the paired t-test applies. Scaling up to three or more groups, ANOVA handles the comparison for normally distributed data, and repeated-measures ANOVA handles it when the same subjects are measured multiple times.
When your goal shifts from comparison to prediction, regression procedures take over. Linear regression predicts a continuous outcome (like blood pressure) from one or more input variables. Logistic regression predicts a binary outcome (like whether a patient develops a complication: yes or no) and produces odds ratios that quantify how much each risk factor changes the likelihood of that outcome. A study identifying risk factors for aspiration pneumonia, for example, would use logistic regression because the outcome is simply yes or no.
Software for Running Statistical Procedures
You rarely perform these procedures by hand anymore. Several tools dominate the landscape. Python, with libraries like pandas and NumPy, is the most versatile option and handles everything from data cleaning to complex modeling. R was built specifically for statistics and data visualization, making it the go-to choice for many statisticians and academic researchers. SQL manages and queries data stored in databases, often serving as the first step before analysis happens in another tool. For large-scale data processing, Apache Spark handles distributed computing and supports Python, R, Java, and Scala. Visual platforms like KNIME offer built-in statistical models and machine learning tools without requiring code, which can make them more accessible for beginners.
The choice of tool rarely changes the statistical procedure itself. A t-test produces the same result whether you run it in R, Python, or a spreadsheet. What differs is the speed, flexibility, and ability to handle large or complex datasets.

