How to Plot an ROC Curve: Python, R, and AUC

An ROC curve is a plot with the false positive rate on the x-axis and the true positive rate (sensitivity) on the y-axis, with one point for every classification threshold your model can use. Plotting one requires three things: true labels, predicted probabilities, and a way to sweep through thresholds to calculate the rates at each one. You can do this manually to understand the logic, or use a library function in Python or R that handles it in a few lines.

What an ROC Curve Actually Shows

Every binary classifier produces a score or probability for each prediction. To turn that score into a yes/no decision, you pick a threshold: anything above it is “positive,” anything below is “negative.” Change the threshold and you change how many true positives and false positives you get. An ROC curve captures that tradeoff by plotting the true positive rate against the false positive rate at every possible threshold, then connecting those points into a curve.

A 45-degree diagonal line from the bottom-left corner to the top-right corner represents a model with no skill, equivalent to flipping a coin. A useful model produces a curve that bows toward the upper-left corner, where true positives are high and false positives are low. The more the curve hugs that top-left corner, the better the model discriminates between classes.

The Logic Behind Each Point on the Curve

To build the curve from scratch, you need to understand two rates. The true positive rate (TPR) is the proportion of actual positives your model correctly identifies. The false positive rate (FPR) is the proportion of actual negatives your model incorrectly flags as positive. At each threshold, you can organize predictions into a simple table of true positives, false positives, true negatives, and false negatives, then calculate TPR and FPR from it.

Start with the highest possible threshold. At that extreme, the model predicts everything as negative, so both TPR and FPR are 0. That gives you the point (0, 0). Now gradually lower the threshold. As you do, more samples get classified as positive. Some of those are true positives (TPR rises), and some are false positives (FPR rises). At the lowest threshold, everything is predicted positive, and you land at (1, 1). Each threshold produces one (FPR, TPR) coordinate pair. Plot them all and connect the dots.

Plotting in Python With Scikit-Learn

The most common approach uses scikit-learn’s roc_curve function and matplotlib. The function takes two required arguments: y_true (an array of true binary labels, typically 0s and 1s) and y_score (an array of predicted probabilities for the positive class). It returns three arrays: false positive rates, true positive rates, and the corresponding thresholds.

from sklearn.metrics import roc_curve, roc_auc_score
import matplotlib.pyplot as plt

# y_test: true labels (0 or 1)
# y_probs: predicted probabilities for the positive class
fpr, tpr, thresholds = roc_curve(y_test, y_probs)
auc = roc_auc_score(y_test, y_probs)

plt.plot(fpr, tpr, label=f"Model (AUC = {auc:.2f})")
plt.plot([0, 1], [0, 1], linestyle="--", color="gray", label="Random chance")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.title("ROC Curve")
plt.legend()
plt.show()

The y_probs array comes from your model’s predict_proba method. For a binary classifier in scikit-learn, that typically looks like model.predict_proba(X_test)[:, 1] to grab the probability of the positive class. The dashed diagonal line represents random chance, giving you a visual baseline to judge your model against.

If your labels aren’t 0 and 1 (for instance, they’re -1 and 1, or string labels), pass the pos_label parameter to tell the function which class counts as positive.

Plotting in R With pROC

The pROC package is the standard choice in R. Its roc function takes a response vector (true labels) and a predictor vector (predicted scores), then computes and optionally plots the curve in one step.

library(pROC)

roc_obj <- roc(response = test_labels, predictor = predicted_probs)
plot(roc_obj, main = "ROC Curve", print.auc = TRUE)

Setting print.auc = TRUE overlays the AUC value directly on the plot. The pROC package also supports comparing multiple ROC curves on the same plot and computing partial AUC for specific regions of the curve, which is useful when you only care about performance in a certain false positive rate range.

Interpreting the AUC Score

The area under the ROC curve (AUC) collapses the entire curve into a single number between 0.5 and 1. An AUC of 0.5 means the model performs no better than random guessing. An AUC of 1.0 means perfect separation between classes. In practice, here’s a general grading scale:

0.9 and above: Excellent discrimination
0.8 to 0.89: Considerable (strong) discrimination
0.7 to 0.79: Fair discrimination
0.6 to 0.69: Poor discrimination
Below 0.6: Effectively no useful discrimination

These ranges are guidelines, not hard rules. What counts as “good enough” depends on the problem. A fraud detection model with an AUC of 0.85 might be perfectly useful, while a medical diagnostic test at 0.85 might not meet clinical standards. Always consider the cost of false positives versus false negatives in your specific context.

Finding the Best Threshold

The ROC curve shows you every possible tradeoff, but it doesn’t tell you which threshold to actually use. One common method is Youden’s J statistic, defined as sensitivity + specificity – 1 for each threshold. The threshold that maximizes J is considered optimal when you want to weight sensitivity and specificity equally. Geometrically, this is the point on the curve farthest from the diagonal line.

In Python, you can find it like this:

import numpy as np

j_scores = tpr - fpr
best_index = np.argmax(j_scores)
best_threshold = thresholds[best_index]
print(f"Optimal threshold: {best_threshold:.3f}")

This approach assumes that catching positives and avoiding false alarms matter equally to you. If one type of error is more costly (missing a disease versus an unnecessary follow-up test, for example), you’d shift the threshold accordingly rather than relying on Youden’s index alone.

When ROC Curves Can Be Misleading

ROC curves work well when your dataset has a roughly balanced number of positive and negative cases. When classes are heavily imbalanced, such as detecting a rare event that occurs in less than 1% of samples, the ROC curve can paint an overly optimistic picture. The false positive rate stays low even when the model generates many false alarms, because the denominator (total actual negatives) is so large.

In these situations, a precision-recall curve is often more informative. Precision measures what fraction of your positive predictions were actually correct, which is more intuitive when positives are rare. Many practitioners in fields like antibody prediction and fraud detection favor precision-recall curves over ROC curves for exactly this reason. If your dataset is reasonably balanced, the ROC curve remains a reliable and widely understood evaluation tool.

Comparing Multiple Models

One of the most practical uses of ROC curves is comparing classifiers side by side. Plot multiple curves on the same axes, each with its AUC in the legend, and you can immediately see which model performs better across all thresholds.

for name, y_probs in model_predictions.items():
    fpr, tpr, _ = roc_curve(y_test, y_probs)
    auc = roc_auc_score(y_test, y_probs)
    plt.plot(fpr, tpr, label=f"{name} (AUC = {auc:.2f})")

plt.plot([0, 1], [0, 1], linestyle="--", color="gray")
plt.xlabel("False Positive Rate")
plt.ylabel("True Positive Rate")
plt.legend()
plt.show()

A higher AUC generally indicates a better model, but look at the curves themselves too. Two models can have similar AUC values yet perform very differently in the region that matters to you. One might excel at low false positive rates while the other does better at high sensitivity. The curve gives you that detail; the single AUC number does not.