What Does xi Mean in Statistics? Notation Explained

In statistics, x_i (pronounced “x sub i”) represents a single data point in a dataset, where the subscript “i” tells you which data point you’re looking at. If you have five test scores, x₁ is the first score, x₂ is the second, and so on up to x₅. It’s a shorthand that lets you refer to any individual value without writing out every number.

How the Subscript Works

The “x” part represents what you’re measuring, like height, weight, or income. The subscript “i” is just a placeholder for position. When i = 1, you’re talking about the first observation. When i = 3, the third. The letter “i” itself doesn’t have a fixed value. It’s a stand-in that can represent any position in your dataset, which is why it’s called an index.

If you collected the ages of four people and got 25, 30, 22, and 41, then:

x₁ = 25
x₂ = 30
x₃ = 22
x₄ = 41

The full dataset is written as x₁, x₂, x₃, …, x_n, where n is the total number of observations. This notation scales to any dataset size. Whether you have 10 values or 10,000, x_i can refer to any one of them.

Uppercase X vs. Lowercase x

You’ll often see both X and x in statistics textbooks, and they mean different things. Uppercase X represents a random variable, which is the abstract concept of a quantity that could take on different values before you’ve collected data. Lowercase x (and by extension x_i) represents actual observed numbers you’ve already recorded.

Think of it this way: before you roll a die, the outcome is X, a random variable that could land on any face. After you roll and see a 4, that specific result is x. The same logic applies to datasets. X is the theoretical measurement, and x₁, x₂, x₃ are the real values sitting in your spreadsheet.

Why It Matters: Summation Notation

The main reason x_i exists is to make formulas compact. Instead of writing “add up every single data point,” statistics uses the Greek letter sigma (Σ) combined with x_i. The expression Σx_i from i = 1 to n means “start at the first value and add every value through the last one.” The index i moves from its starting point (usually 1) to its endpoint (n), picking up each data point along the way.

This shows up in nearly every core formula. The sample mean (x̄) is:

x̄ = Σx_i / n

In plain terms: add up all the values, then divide by how many you have. Using the ages from earlier: (25 + 30 + 22 + 41) / 4 = 29.5.

Sample variance builds on the same idea:

s² = Σ(x_i – x̄)² / (n – 1)

Here, x_i – x̄ is the distance between each data point and the average. You square those distances, add them up, and divide by n – 1. Every time you see x_i in a formula, it’s telling you to do something with each individual value in your dataset, one at a time.

Double Subscripts: x_ij

When data is organized into groups, you’ll sometimes see x_ij with two subscripts. The first index (i) identifies which observation within a group, and the second (j) identifies which group. If you’re comparing test scores across three classrooms, x_2,3 would be the second student’s score in the third classroom.

This notation is common in analysis of variance (ANOVA) and any situation where data naturally falls into categories. It works like coordinates in a table: one subscript picks the row, the other picks the column.

Quick Reference for Common Notation

x_i: the i-th data point in your sample
n: total number of observations
x̄: the sample mean (average of all x_i values)
Σx_i: the sum of all data points from x₁ through x_n
x_ij: the i-th observation in the j-th group

The letter “i” is the most common index variable, but you’ll also see j, k, or other letters when a formula needs to track multiple positions at once. The logic is always the same: the subscript points to a specific value in a collection of data.

How the Subscript Works

Uppercase X vs. Lowercase x

Why It Matters: Summation Notation

Double Subscripts: xij

Quick Reference for Common Notation

Double Subscripts: x_ij