What Does xi Mean in Statistics? Notation Explained

In statistics, xi (pronounced “x sub i”) represents a single data point in a dataset, where the subscript “i” tells you which data point you’re looking at. If you have five test scores, x1 is the first score, x2 is the second, and so on up to x5. It’s a shorthand that lets you refer to any individual value without writing out every number.

How the Subscript Works

The “x” part represents what you’re measuring, like height, weight, or income. The subscript “i” is just a placeholder for position. When i = 1, you’re talking about the first observation. When i = 3, the third. The letter “i” itself doesn’t have a fixed value. It’s a stand-in that can represent any position in your dataset, which is why it’s called an index.

If you collected the ages of four people and got 25, 30, 22, and 41, then:

  • x1 = 25
  • x2 = 30
  • x3 = 22
  • x4 = 41

The full dataset is written as x1, x2, x3, …, xn, where n is the total number of observations. This notation scales to any dataset size. Whether you have 10 values or 10,000, xi can refer to any one of them.

Uppercase X vs. Lowercase x

You’ll often see both X and x in statistics textbooks, and they mean different things. Uppercase X represents a random variable, which is the abstract concept of a quantity that could take on different values before you’ve collected data. Lowercase x (and by extension xi) represents actual observed numbers you’ve already recorded.

Think of it this way: before you roll a die, the outcome is X, a random variable that could land on any face. After you roll and see a 4, that specific result is x. The same logic applies to datasets. X is the theoretical measurement, and x1, x2, x3 are the real values sitting in your spreadsheet.

Why It Matters: Summation Notation

The main reason xi exists is to make formulas compact. Instead of writing “add up every single data point,” statistics uses the Greek letter sigma (Σ) combined with xi. The expression Σxi from i = 1 to n means “start at the first value and add every value through the last one.” The index i moves from its starting point (usually 1) to its endpoint (n), picking up each data point along the way.

This shows up in nearly every core formula. The sample mean (x̄) is:

x̄ = Σxi / n

In plain terms: add up all the values, then divide by how many you have. Using the ages from earlier: (25 + 30 + 22 + 41) / 4 = 29.5.

Sample variance builds on the same idea:

s² = Σ(xi – x̄)² / (n – 1)

Here, xi – x̄ is the distance between each data point and the average. You square those distances, add them up, and divide by n – 1. Every time you see xi in a formula, it’s telling you to do something with each individual value in your dataset, one at a time.

Double Subscripts: xij

When data is organized into groups, you’ll sometimes see xij with two subscripts. The first index (i) identifies which observation within a group, and the second (j) identifies which group. If you’re comparing test scores across three classrooms, x2,3 would be the second student’s score in the third classroom.

This notation is common in analysis of variance (ANOVA) and any situation where data naturally falls into categories. It works like coordinates in a table: one subscript picks the row, the other picks the column.

Quick Reference for Common Notation

  • xi: the i-th data point in your sample
  • n: total number of observations
  • : the sample mean (average of all xi values)
  • Σxi: the sum of all data points from x1 through xn
  • xij: the i-th observation in the j-th group

The letter “i” is the most common index variable, but you’ll also see j, k, or other letters when a formula needs to track multiple positions at once. The logic is always the same: the subscript points to a specific value in a collection of data.