Statistics is the science of collecting, organizing, analyzing, and interpreting data to find patterns and draw conclusions. It’s the toolkit behind almost every number you encounter in the news, from election polls to clinical trial results to economic forecasts. Whether you’re looking at a batting average, a COVID infection rate, or your credit score, statistics is the discipline that turns raw numbers into meaning.
The Two Main Branches
Statistics splits into two core branches: descriptive and inferential. Descriptive statistics summarize data you already have. If you calculate the average test score for a class of 30 students, that’s descriptive. You’re reporting what the numbers say, nothing more. Charts, graphs, percentages, and averages all fall here.
Inferential statistics take things further. Instead of describing data you already collected, you use a smaller sample to draw conclusions about a much larger group. When a poll surveys 1,500 people and then makes claims about the opinions of 330 million Americans, that’s inferential statistics at work. The key distinction: descriptive statistics state facts about the data in front of you, while inferential statistics use samples to make predictions about populations you haven’t fully measured.
Core Measures You’ll See Everywhere
A handful of measures show up constantly in statistics, and they’re worth understanding because they shape how data gets reported in everything from medical studies to sports analytics.
- Mean: the average. Add all the values together and divide by how many there are.
- Median: the middle value when you line up all your numbers from smallest to largest. This is often more useful than the mean when a few extreme values (like billionaire incomes in a salary survey) would skew the average.
- Mode: the value that appears most frequently in a data set.
- Standard deviation: a measure of how spread out the values are from the average. A low standard deviation means most values cluster tightly around the mean. A high one means the data is scattered widely.
Mean, median, and mode are called measures of central tendency because they each try to identify the “center” of a data set, just in different ways. Standard deviation is a measure of spread, telling you how much variation exists. Together, these four numbers give you a surprisingly complete snapshot of almost any data set.
Types of Data
Not all data works the same way, and statistics classifies data into four levels based on what you can do with it.
Nominal data is purely categorical. Eye color, city of birth, or marital status are nominal. You can sort people into groups, but you can’t rank those groups in any meaningful order. Ordinal data adds ranking. A customer satisfaction survey that runs from “very dissatisfied” to “very satisfied” is ordinal: the categories have a clear order, but the gaps between them aren’t necessarily equal.
Interval data has both order and equal spacing between values, but no true zero point. Temperature in Fahrenheit is the classic example: the difference between 30°F and 40°F is the same as between 80°F and 90°F, but 0°F doesn’t mean “no temperature.” Ratio data has all the properties of interval data plus a meaningful zero. Height, weight, and age are ratio data. Zero means zero, and you can make statements like “twice as heavy” or “three times as old.”
How a Statistical Study Works
A statistical investigation follows a predictable sequence. It starts with a clear question: Does this drug reduce blood pressure? Is crime rising in this city? Which marketing campaign drives more sales? From there, you determine what data you need and where to find it.
Data collection comes next, through surveys, experiments, public records, sensors, or any other source. Once collected, the data gets organized into tables, charts, or databases so it can be analyzed. Analysis involves applying the right statistical methods to detect patterns, relationships, or differences. Finally, interpretation translates those results into conclusions that answer the original question. Skipping or rushing any step, especially the upfront question definition, leads to unreliable results.
Sampling: Why You Don’t Need Everyone
Most statistical studies can’t measure an entire population. Instead, researchers select a sample and use it to represent the whole group. How that sample gets chosen matters enormously.
Simple random sampling gives every individual in the population an equal chance of being selected, like drawing names from a hat. It’s straightforward but requires a complete list of the population. Stratified random sampling divides the population into subgroups (by age, income, region, or any relevant characteristic) and then randomly samples within each subgroup. This guarantees that smaller or underrepresented groups show up in the sample, which simple random sampling might miss.
Cluster sampling is used when the population is too large or spread out to list everyone. Researchers divide the population into geographic or organizational clusters, randomly select some clusters, and then sample individuals within those. It’s common in large-scale public health studies.
Non-probability methods also exist. Snowball sampling, for instance, recruits participants through referrals, which is useful for hard-to-reach populations like homeless individuals or undocumented workers. These methods are practical but carry a higher risk of bias because not everyone has an equal chance of being included.
P-Values and Statistical Significance
When researchers test whether a finding is real or just due to chance, they calculate something called a p-value. The p-value represents the probability that the observed result would occur if there were actually no real effect. The most common threshold is 0.05, meaning there’s a 5% or lower chance the result happened by random variation alone. A p-value below 0.05 is typically called “statistically significant.”
This threshold isn’t sacred. Some fields set it at 0.01 (stricter) or 0.10 (more lenient) depending on the stakes involved. And statistical significance doesn’t automatically mean practical significance. A drug might produce a statistically significant blood pressure reduction of 1 point, which is real but clinically meaningless.
Two types of errors lurk in this process. A Type I error, or false positive, happens when you conclude something is real when it isn’t. Think of it as convicting an innocent person. A Type II error, or false negative, happens when you miss a real effect. That’s like letting a guilty person walk free. Researchers set acceptable thresholds for both types of error before running their analysis, typically allowing a 5% chance of a Type I error and a 10% chance of a Type II error.
Where Statistics Gets Used
Statistics is foundational to fields that might not immediately come to mind. In public health, researchers use statistical models to track disease outbreaks, estimate the health effects of air pollution, and predict cardiovascular disease risk by combining clinical, genetic, and lifestyle data. During the COVID-19 pandemic, machine learning algorithms built on statistical methods were deployed at Johns Hopkins to predict which emergency patients were likely to deteriorate, helping doctors make faster decisions about care.
In economics and business, statistics drives everything from inflation measurement to A/B testing on websites. Insurance companies use it to set premiums. Governments use it to allocate resources. Sports teams use it to evaluate player performance and make draft picks. Even social media monitoring relies on statistical techniques: during the 2014 Ebola epidemic, researchers applied language-processing methods to Twitter data to build a real-time outbreak surveillance model.
Statistics vs. Data Science
These two fields overlap significantly but aren’t identical. Both work with raw data and aim to convert it into insights. The core difference is scale and tooling. Statisticians typically work with smaller, structured data sets and focus heavily on mathematical modeling. Data scientists work with massive, often messy data sets and need strong programming and computer science skills on top of graduate-level statistics.
Statistical modeling is common to both fields, but data science adds layers of computational infrastructure, like building machine learning algorithms that can process millions of records in real time. Think of statistics as the mathematical foundation and data science as one of its modern, tech-heavy applications.
Career Outlook
The demand for people with statistical skills is growing faster than most professions. The U.S. Bureau of Labor Statistics projects 9% employment growth for statisticians from 2024 to 2034, well above the national average. About 32,200 statisticians were employed in 2024, with roughly 2,200 new openings expected each year over the next decade. That growth is driven by expanding use of statistical analysis in healthcare, business strategy, and public policy. Combined with related roles in data science, analytics, and research, statistics training opens doors across virtually every industry.

