What Is Normalized Power and How Is It Calculated?

Normalized Power (NP) is a cycling metric that estimates the true physiological cost of a ride by accounting for changes in intensity. Developed by exercise physiologist Dr. Andrew Coggan in the early 2000s and published in the 2006 book Training and Racing with a Power Meter, it solves a fundamental problem: average power dramatically underestimates how hard a variable ride actually feels.

Why Average Power Falls Short

Average power is simply the arithmetic mean of every watt you produce over a ride. If you hold a steady 200 watts for an hour, your average is 200W. If you alternate between coasting at 0 watts and hammering at 400 watts, your average is also 200W. But anyone who has ridden both ways knows the second scenario is far more exhausting. Your body doesn’t experience effort as a neat average. Hard surges burn through glycogen, spike heart rate, and generate fatigue that easy spinning doesn’t undo. Average power treats these two rides as identical when they clearly aren’t.

Normalized Power corrects for this by weighting harder efforts more heavily. It answers the question: “What steady power output would have produced the same physiological stress as this variable ride?” The result is always equal to or higher than your average power. How much higher depends on how variable the ride was.

How Normalized Power Is Calculated

Your cycling computer handles this automatically, but understanding the math helps explain why NP behaves the way it does. The calculation follows four steps:

Step 1: Calculate a 30-second rolling average of your power data. This smooths out very brief spikes and gaps (like a two-second sprint or a momentary coast) so the algorithm focuses on sustained efforts rather than noise.
Step 2: Raise each of those 30-second averages to the fourth power. This is the key step. Raising to the fourth power dramatically amplifies high values while shrinking low ones. A surge to 400W doesn’t just count as twice a 200W effort; it counts as roughly 16 times as costly.
Step 3: Average all those raised values together.
Step 4: Take the fourth root of that average to bring the number back into watts.

The fourth-power weighting is what makes NP useful. It mirrors the way your body responds to intensity: the metabolic cost of producing power increases disproportionately as wattage climbs. Doubling your power output for a surge costs far more than double the energy. The math captures that biological reality.

Variability Index: Measuring Ride Smoothness

Dividing your Normalized Power by your average power gives you a number called the Variability Index (VI). It tells you how evenly paced your ride was.

A VI of 1.0 means perfectly steady output, where NP equals average power exactly. In practice, even disciplined time trials and steady endurance rides produce a VI between 1.02 and 1.06 because small fluctuations are inevitable (hills, wind, intersections). Interval sessions and races typically land between 1.1 and 1.3 or higher. A criterium race with constant accelerating out of corners, coasting in the pack, and surging on attacks can push VI well above 1.3.

Tracking VI is useful for pacing. If you’re riding a long event and your VI is creeping above 1.05, you’re likely surging and recovering more than necessary, which burns energy faster than holding a steadier effort would.

Intensity Factor and Training Stress Score

Normalized Power is also the foundation for two other widely used training metrics. Intensity Factor (IF) is your NP divided by your Functional Threshold Power (FTP), expressed as a ratio. If your FTP is 250 watts and your NP for a ride was 225 watts, your IF is 0.90, meaning you rode at 90% of your threshold. This makes it easy to compare the intensity of rides across different durations and conditions without guessing based on feel.

Training Stress Score (TSS) then combines IF, NP, and ride duration into a single number representing the total training load. The formula is: (duration in seconds × NP × IF) ÷ (FTP × 3600) × 100. A one-hour ride at exactly your FTP produces a TSS of 100, which serves as a useful reference point. TSS lets you compare the cumulative stress of very different sessions, like a short, intense interval workout versus a long, moderate endurance ride, and track weekly training load over time.

Without NP feeding into these calculations, both IF and TSS would rely on average power and systematically undercount the cost of hard, variable efforts.

When NP Matters Most

NP is most valuable when your riding is variable. For a steady indoor trainer session with minimal power fluctuation, NP and average power will be nearly identical, and either metric tells the same story. But for outdoor rides with hills, group dynamics, wind, traffic, or race tactics, NP gives a far more accurate picture of what your body actually went through.

Some practical situations where NP is especially useful:

Comparing indoor and outdoor rides: A 200W average on a flat indoor ride is not the same as a 200W average on a hilly outdoor loop. NP on the hilly ride will be higher, reflecting the reality that it was harder.
Pacing long events: In an Ironman bike leg or a century ride, keeping your NP at or below a target percentage of FTP helps prevent early burnout. Watching NP in real time lets you adjust before you dig too deep.
Evaluating race performance: After a criterium or road race, average power might look low because of time spent coasting in the draft. NP reveals the true cost of the surges and attacks that defined the race.

Trademark and Platform Differences

Normalized Power is a registered trademark of TrainingPeaks, which means not every platform can use the name. You’ll see it labeled as NP on TrainingPeaks, Garmin Connect, and compatible head units, but other platforms sometimes use alternative names for the same or similar calculations. Strava, for example, displays “Weighted Average Power.” The underlying math is essentially the same concept: weighting higher intensities more heavily to reflect true physiological cost. If you use multiple platforms and see slightly different numbers, it’s usually due to minor differences in how each platform handles data smoothing or zero values rather than a fundamentally different approach.