What Is a Scatterplot and How Does It Help Us?

A scatterplot is a graph that displays two variables as dots on a grid, letting you see at a glance whether those variables are related. One variable runs along the horizontal axis, the other along the vertical axis, and each dot represents a single observation where those two measurements meet. It’s one of the simplest and most powerful tools in data visualization because it turns raw numbers into a visual pattern you can immediately interpret.

How a Scatterplot Is Built

Every scatterplot starts with paired data: two measurements taken from the same person, place, or event. If you’re looking at height and weight for 50 people, each person contributes one dot. Their height determines where the dot sits along the horizontal axis, and their weight determines how high it sits on the vertical axis. Where those two values intersect, you place the dot.

The horizontal axis typically holds the independent variable (the thing you think might be doing the influencing), while the vertical axis holds the dependent variable (the thing you think might be affected). If you’re exploring whether study hours affect test scores, study hours go on the horizontal axis and test scores go on the vertical. When two data points land in the same spot, they’re placed side by side so both remain visible.

Spotting Relationships in the Pattern

The real value of a scatterplot is in the shape the dots form. Three broad patterns emerge:

Positive association: The dots trend upward from left to right. As one variable increases, the other tends to increase too. Think height and weight, or hours studied and exam scores.
Negative association: The dots trend downward from left to right. As one variable goes up, the other tends to go down. The CDC offers an example where higher household income correlates with lower scores on a particular measure, producing that downward slope.
No association: The dots look scattered randomly with no discernible direction. The two variables don’t appear linked to one another.

Some patterns aren’t simple straight lines. Dots can form curves, U-shapes, or clusters. A CDC example describes dots that “hug closely together, but the trend is non-linear,” meaning the relationship between the variables exists but changes direction at certain points. A scatterplot makes these complex relationships visible in a way that a table of numbers never could.

Finding Outliers

Once a scatterplot reveals a pattern, individual dots that break from that pattern become immediately obvious. These are outliers, and they often represent the most interesting observations in a dataset. In a study of backpack weight and student body weight, one student carrying a far heavier pack than others of similar size would appear as a dot sitting well above the trend. Another carrying an unusually light pack would fall below it.

Outliers aren’t mistakes by default. Sometimes they point to data entry errors that need fixing. Other times they reveal genuinely unusual cases worth investigating. In one educational dataset, several states showed average math scores far below what their participation rates would predict. Those outliers prompted questions about what was different in those states. Without the scatterplot, those unusual cases would have been buried in a spreadsheet.

Adding a Trend Line for Predictions

A scatterplot becomes even more useful when you add a trend line (sometimes called a line of best fit). This is a single straight line drawn through the cloud of dots that summarizes the overall direction and steepness of the relationship. The line is positioned to keep the distance between itself and all the data points as small as possible.

Once you have that line, you can use it to make predictions. Penn State’s statistics program offers a clear example: using height to predict weight. With a trend line calculated from student data, you can plug in a height of 66 inches and get a predicted weight of about 138 pounds. The prediction won’t be perfect for any individual (people of the same height vary in weight), but it gives a reasonable estimate based on the overall trend. This ability to forecast from patterns is one of the most common practical uses of scatterplots in fields from medicine to economics.

When to Choose a Scatterplot Over Other Charts

Scatterplots work best when you have two continuous, numerical variables and you want to explore whether they’re related. If you’re comparing categories (like sales by region), a bar chart is the better choice. If you need to track how a single variable changes over time and want to see the exact rate of change between consecutive points, a line graph connects the dots directly and makes those shifts clearer.

The key difference between a scatterplot and a line graph is intent. A line graph connects each point to the next, emphasizing local changes from one measurement to the next. A scatterplot leaves the dots unconnected, letting you focus on the broader trend and the overall distribution of data rather than the path between individual points. Choose a scatterplot when you care about the big picture relationship, not the step-by-step journey.

Correlation Does Not Mean Causation

This is the single most important thing to remember when reading a scatterplot. A strong visual pattern between two variables does not prove that one causes the other. Countries with higher fertility rates tend to have lower life expectancies, and a scatterplot would show that negative trend clearly. But having fewer children doesn’t directly cause a person to live longer. Other factors, like access to medical care and education levels, likely drive both variables simultaneously.

The same logic applies everywhere. If a scatterplot shows that people who eat less fatty food have lower rates of heart disease, it’s tempting to conclude that cutting fat prevents heart disease. But a person’s genetic makeup could independently reduce their appetite for fatty foods and protect them from heart disease, with no direct link between diet and outcomes. Two variables can rise and fall together simply because a third, unmeasured factor is pulling both strings. Even ice cream sales and drowning rates correlate strongly, not because ice cream is dangerous, but because hot weather drives both.

No matter how tight the dot pattern looks, you cannot confirm a cause-and-effect relationship from a scatterplot alone. That requires controlled experiments where you change one variable and hold everything else constant. The scatterplot’s job is to reveal that a relationship exists and how strong it appears. Figuring out why it exists is a separate step entirely.

Making a Scatterplot Easy to Read

A few simple practices keep scatterplots clear and honest. Label both axes with the variable name and its units so readers don’t have to guess what they’re looking at. Use consistent scales on both axes; stretching or compressing one axis can make a weak relationship look strong or a strong one look flat. Keep gridlines light and limit tick marks to reduce visual clutter. The data points themselves should be the most prominent element on the graph, not the decorations around them.