Basic statistics is the set of tools you use to collect, organize, summarize, and draw conclusions from data. It splits into two broad branches: descriptive statistics, which summarize what your data actually shows, and inferential statistics, which use a smaller sample to make predictions about a larger group. Whether you’re reading a news poll, comparing prices, or interpreting medical test results, these concepts are working in the background.
Descriptive vs. Inferential Statistics
Descriptive statistics report the facts of a dataset. If you survey every employee in your company about job satisfaction and calculate an average rating, that number describes the entire group with high certainty. You’re not guessing or projecting. You’re just summarizing what’s there.
Inferential statistics do something more ambitious. They take a smaller, randomly chosen sample and use it to draw conclusions about a much larger population. A political poll, for instance, surveys a few thousand people and then estimates how millions of voters feel. Because you’re working from incomplete information, inferential results always carry a margin of error. The calculations are more complex, and the certainty is lower, but inferential methods are often the only practical option when studying an entire population isn’t feasible.
Types of Data
Before you calculate anything, you need to know what kind of data you’re working with. In the 1940s, psychologist Stanley Smith Stevens outlined four levels of measurement that are still used today: nominal, ordinal, interval, and ratio. The level determines which statistical tools make sense.
- Nominal data are categories with no natural order. Blood type, eye color, and political party are all nominal. You can count how often each category appears, but averaging them would be meaningless.
- Ordinal data have a clear ranking but no consistent spacing between ranks. Education level (high school, bachelor’s, master’s, PhD) tells you one is higher than another, but the “distance” between levels isn’t equal or measurable.
- Interval data have consistent spacing but no true zero point. Temperature in Fahrenheit is a classic example: the difference between 30°F and 40°F is the same as between 80°F and 90°F, but 0°F doesn’t mean “no temperature.”
- Ratio data have both consistent spacing and a meaningful zero. Weight, height, concentration, and time all qualify. Zero means the quantity is truly absent, and you can say one value is “twice” another.
Getting the level right matters because it prevents mistakes. You wouldn’t calculate a mean zip code, for example, even though zip codes are numbers. They’re nominal labels, not quantities.
Measures of Central Tendency
Central tendency is a formal way of asking: where does the middle of this data fall? The three main answers are the mean, median, and mode.
The mean is the arithmetic average. Add up all the values and divide by how many there are. It’s the most commonly used measure, but it has a weakness: outliers pull it in their direction. If nine people in a room earn $50,000 and one earns $5 million, the mean income would be around $545,000, which doesn’t reflect anyone’s actual experience.
The median is the middle value when you line up all data points from smallest to largest. Half the values fall below it, half above. It resists the pull of outliers, which is why income reports and home prices typically use the median instead of the mean. In the example above, the median would still be $50,000.
The mode is the most frequently occurring value. A dataset can have no mode (if every value is unique), one mode, or multiple modes. It’s also the only measure of the three that works for categorical data. If you survey people’s favorite ice cream flavor and “chocolate” appears most often, chocolate is the mode.
Measures of Spread
Knowing the center of your data is only half the picture. Two datasets can have the same mean but look completely different if one is tightly clustered and the other is spread out. Measures of spread (also called variability or dispersion) capture that difference.
The range is the simplest: subtract the smallest value from the largest. It gives you the full span of your data in a single number, but it’s sensitive to outliers and tells you nothing about what happens between the extremes.
The variance offers a more detailed picture. It calculates how far each data point sits from the mean, squares those differences (so negative and positive gaps don’t cancel out), and averages the result. A high variance means values are widely scattered. The catch is that squaring the differences changes the units. If your original data is in pounds, the variance is in “pounds squared,” which isn’t intuitive.
That’s where standard deviation comes in. It’s simply the square root of the variance, which brings the units back to the original scale. If a group of test scores has a mean of 75 and a standard deviation of 10, a typical score falls within about 10 points of the mean. Standard deviation is the most widely reported measure of spread in both research and everyday data summaries.
The Normal Distribution and the 68-95-99.7 Rule
Many natural measurements, from human heights to blood pressure readings, follow a pattern called the normal distribution. When graphed, it forms a symmetrical, bell-shaped curve centered on the mean. Most values cluster near the middle, and extreme values become increasingly rare as you move outward.
The normal distribution follows a predictable pattern known as the empirical rule. About 68% of all data points fall within one standard deviation of the mean. About 95% fall within two standard deviations. And about 99.7% fall within three. So if the average adult body temperature is 98.6°F with a standard deviation of 0.7°F, roughly 95% of people would have a temperature between 97.2°F and 100.0°F. A reading outside that range would be statistically unusual and worth paying attention to.
This rule gives you a quick way to judge whether a specific data point is ordinary or extreme, which is foundational to how statistical tests work.
Populations, Samples, and Symbols
Statistics distinguishes between a population (the entire group you’re interested in) and a sample (the subset you actually measure). If you want to know the average weight of all middle-aged women in the United States, that entire group is the population. The 500 women you actually weigh are your sample.
Numbers that describe a population are called parameters. Numbers that describe a sample are called statistics. They even get different symbols: the population mean is written with the Greek letter mu (μ), while the sample mean is written as x̄ (“x-bar”). This distinction matters because a sample statistic is always an estimate of the true population parameter, never a perfect match.
Basics of Inferential Statistics
Inferential statistics help you decide whether a pattern you observe in your sample is real or just random noise. The process typically starts with a null hypothesis, which states that nothing interesting is happening. For example, “this new drug has no effect on blood pressure.” Your goal is usually to disprove that null hypothesis.
The key metric is the p-value, which measures the probability of seeing your results (or something more extreme) if the null hypothesis were actually true. A small p-value means your data would be very unlikely under the “nothing is happening” scenario, so you reject the null hypothesis. Most research uses a threshold (called alpha) of 0.05, meaning the researcher accepts a 5% chance of being wrong. If your p-value is below 0.05, the result is considered statistically significant.
That 0.05 threshold is a convention, not a magic number. It means researchers aim to be correct about 95% of the time. In fields where the stakes are higher, like drug safety testing, stricter thresholds are common.
Common Ways to Visualize Data
Charts and graphs turn raw numbers into patterns your brain can process quickly. Three of the most useful visualizations in basic statistics each serve a distinct purpose.
Histograms group continuous data into intervals (called bins) and show how many data points fall in each one. They’re ideal for revealing the shape of your data: whether it’s symmetrical, skewed to one side, or has multiple peaks. You can spot outliers, see where values concentrate, and get a visual sense of central tendency and spread all at once.
Box plots (also called box-and-whisker plots) compress five key descriptive statistics into one compact graphic: the minimum, first quartile, median, third quartile, and maximum. They’re especially useful for comparing distributions across groups and for flagging outliers, which appear as individual dots beyond the “whiskers.”
Scatter plots display two variables at once, with each data point plotted as a dot on an x-y grid. They’re the go-to tool for exploring whether two variables are related. If the dots trend upward from left to right, you’re looking at a positive relationship. If they slope downward, it’s negative. If they look like a shapeless cloud, there’s likely no relationship.
Where Basic Statistics Show Up in Real Life
You encounter basic statistics constantly, even when they’re not labeled as such. A batting average is a mean. A weather forecast saying there’s a 30% chance of rain is a probability. The “average home price” in your city is almost certainly a median, chosen because a few mansions would skew the mean.
In healthcare, statistical methods drive diagnostics, drug development, and decisions about which treatments work. Clinical trials compare outcomes between treatment and control groups using inferential tests, and public health officials track disease patterns with descriptive statistics on large populations. In business, companies use A/B testing (a form of hypothesis testing) to decide everything from website layouts to pricing strategies. Customer satisfaction surveys rely on ordinal scales, and quality control in manufacturing monitors whether products stay within an acceptable number of standard deviations from a target specification.
Understanding even the basics gives you a sharper eye for claims made in news articles, advertisements, and workplace reports. When someone cites an “average,” you’ll know to ask whether it’s a mean or median. When a headline declares a result “significant,” you’ll understand that it’s a statistical term with a specific meaning, not just a synonym for “important.”

