How to Plot PCA in Python and R With Examples

Plotting PCA comes down to three steps: standardize your data, fit the PCA model, and scatter plot the first two (or three) principal components. The details of each step vary slightly between Python and R, but the logic is identical. Below is everything you need to go from raw data to a clean, properly labeled PCA plot.

Why You Need to Scale Your Data First

PCA finds the directions in your data that capture the most variance. If one feature is measured in thousands while another is measured in decimals, PCA will treat the large-scale feature as the most important simply because its numbers are bigger. In the classic wine dataset, for example, the “proline” feature is about two orders of magnitude larger than the others. Without scaling, proline single-handedly dominates the first principal component, and the plot tells you almost nothing useful.

The fix is standardization (also called Z-score normalization): subtract the mean and divide by the standard deviation for each feature. After scaling, every feature contributes on roughly the same footing, and the principal components reflect genuine patterns rather than measurement units. In Python, you do this with StandardScaler from scikit-learn. In R, the prcomp() function has a built-in scale. = TRUE argument.

Plotting PCA in Python

The standard workflow uses scikit-learn for the PCA computation and matplotlib (or plotly) for the visualization. Here’s a minimal working example using the Iris dataset:

from sklearn.preprocessing import StandardScaler
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris
import pandas as pd

# Load data
iris = load_iris()
X = iris.data
y = iris.target

# Standardize
X_scaled = StandardScaler().fit_transform(X)

# Fit PCA
pca = PCA(n_components=2)
components = pca.fit_transform(X_scaled)

# Plot
plt.figure(figsize=(8, 6))
for label in set(y):
    mask = y == label
    plt.scatter(components[mask, 0], components[mask, 1],
                label=iris.target_names[label])

plt.xlabel(f"PC1 ({pca.explained_variance_ratio_[0]*100:.1f}%)")
plt.ylabel(f"PC2 ({pca.explained_variance_ratio_[1]*100:.1f}%)")
plt.legend()
plt.title("PCA of Iris Dataset")
plt.show()

A few things to note. fit_transform() fits the model and projects your data in one call. The result is an array with shape (n_samples, n_components), so column 0 is PC1 and column 1 is PC2. Coloring points by class label (the for loop) is what makes the plot interpretable. Without color, you’re just looking at a cloud of dots.

Plotting PCA in R

R has several PCA functions. The most common is prcomp() from the base stats package. The projected coordinates live in the $x slot of the resulting object.

library(ggplot2)

# Run PCA with scaling
pca <- prcomp(iris[, 1:4], scale. = TRUE)

# Build a data frame for ggplot
pcaData <- data.frame(
  PC1 = pca$x[, 1],
  PC2 = pca$x[, 2],
  Species = iris$Species
)

# Calculate % variance explained
var_pct <- summary(pca)$importance[2, ] * 100

# Plot
ggplot(pcaData, aes(PC1, PC2, color = Species)) +
  geom_point(size = 2) +
  coord_fixed() +
  xlab(paste0("PC1: ", round(var_pct[1], 1), "%")) +
  ylab(paste0("PC2: ", round(var_pct[2], 1), "%"))

coord_fixed() keeps the aspect ratio honest so distances on the x-axis and y-axis are visually comparable. If you want a quick one-liner, the ggfortify package lets you skip the manual data frame entirely:

library(ggfortify)
autoplot(pca, data = iris, colour = "Species", size = 2) + coord_fixed()

For adding 95% confidence ellipses around groups, which is common in genomics and ecology papers, add stat_ellipse() to your ggplot call:

ggplot(pcaData, aes(PC1, PC2, color = Species)) +
  geom_point(size = 2) +
  coord_fixed() +
  stat_ellipse(aes(group = Species), level = 0.95, alpha = 0.2)

Always Label Axes With Variance Explained

This is the single most important labeling convention in PCA plots. Each axis should show what percentage of the total variance that component captures. A label like “PC1 (72.8%)” immediately tells the reader that the horizontal spread accounts for nearly three-quarters of the variation in the data, while “PC2 (22.9%)” captures about a quarter.

The calculation is straightforward: the variance explained by a component equals its eigenvalue (the square of its standard deviation) divided by the sum of all eigenvalues. In scikit-learn, pca.explained_variance_ratio_ gives you this directly as a decimal. In R, summary(pca)$importance[2, ] returns the same values. Multiply by 100 for percentages and paste them into your axis labels. Skipping this step is the most common mistake in PCA plots, and reviewers will flag it.

Choosing How Many Components to Plot

Two components are the default because they fit on a flat scatter plot, but they don’t always capture enough variance to be meaningful. A scree plot helps you decide. It shows the variance explained by each component in descending order, and you look for the “elbow” where additional components stop adding much.

Two common quantitative rules for finding the elbow: pick the point where individual components each explain less than 5% of the variance and the cumulative total exceeds 90%, or pick where the change in variance between consecutive components drops below 0.1%. In practice, these often land in the same neighborhood.

To make a scree plot in Python:

pca_full = PCA().fit(X_scaled)
plt.plot(range(1, len(pca_full.explained_variance_ratio_) + 1),
         pca_full.explained_variance_ratio_, marker='o')
plt.xlabel("Component")
plt.ylabel("Variance Explained")
plt.title("Scree Plot")
plt.show()

If your first two components only capture a small fraction of the total variance (say, under 50% combined), a 2D plot may be misleading. In that case, consider a 3D plot.

3D PCA Plots for More Variance

When two components aren’t enough, adding a third can make a big difference. Plotly makes this easy with interactive 3D scatter plots that you can rotate and zoom. On the Iris dataset, three components capture 99.48% of the total variance compared to roughly 96% with two.

import plotly.express as px
from sklearn.decomposition import PCA

pca = PCA(n_components=3)
components = pca.fit_transform(X_scaled)
total_var = pca.explained_variance_ratio_.sum() * 100

fig = px.scatter_3d(
    x=components[:, 0], y=components[:, 1], z=components[:, 2],
    color=iris.target_names[y],
    labels={'x': 'PC1', 'y': 'PC2', 'z': 'PC3'},
    title=f'Total Explained Variance: {total_var:.2f}%'
)
fig.show()

The resulting plot is interactive in a Jupyter notebook or browser. For static reports, a 2D plot of PC1 vs. PC2 (and optionally PC1 vs. PC3) is usually clearer.

Score Plots vs. Biplots

The scatter plots shown above are score plots: each point represents one observation, positioned by its values on the principal components. This is the most common type of PCA plot and what most people mean when they say “PCA plot.”

A biplot overlays the variable loadings on top of the scores. Each original feature appears as an arrow radiating from the origin. The arrow’s direction shows how that feature correlates with the components, and the arrow’s length indicates how well-represented that feature is in the reduced space. The angle between two arrows approximates the correlation between those two variables: arrows pointing the same direction are positively correlated, arrows at 90 degrees are uncorrelated, and arrows pointing opposite directions are negatively correlated.

Biplots pack a lot of information into one figure, which is both their strength and their weakness. With more than about 15 features, the arrows overlap and the plot becomes unreadable. For high-dimensional data, it’s better to plot scores and loadings separately. In R, the factoextra and PCAtools packages handle loading plots well. In Python, you can manually plot the loading vectors from pca.components_ as arrows on a second figure.

# Python loading plot
loadings = pca.components_.T  # shape: (n_features, n_components)
feature_names = iris.feature_names

plt.figure(figsize=(8, 6))
for i, name in enumerate(feature_names):
    plt.arrow(0, 0, loadings[i, 0], loadings[i, 1],
              head_width=0.02, color='red')
    plt.text(loadings[i, 0]*1.1, loadings[i, 1]*1.1, name)
plt.xlabel("PC1")
plt.ylabel("PC2")
plt.title("Loading Plot")
plt.axhline(0, color='grey', linewidth=0.5)
plt.axvline(0, color='grey', linewidth=0.5)
plt.show()

Use a score plot when you want to see how samples cluster. Use a biplot or loading plot when you want to understand which original features are driving those clusters.