Accuracy is calculated by dividing the number of correct predictions by the total number of predictions, then multiplying by 100 to get a percentage. The formula is: Accuracy = (True Positives + True Negatives) / (True Positives + True Negatives + False Positives + False Negatives). Here’s how to apply it step by step, with worked examples you can follow.
The Core Formula
Every accuracy calculation comes down to four counts from a confusion matrix:
- True Positives (TP): You predicted “yes” and the answer was actually yes.
- True Negatives (TN): You predicted “no” and the answer was actually no.
- False Positives (FP): You predicted “yes” but the answer was actually no.
- False Negatives (FN): You predicted “no” but the answer was actually yes.
The formula combines these into a single number:
Accuracy = (TP + TN) / (TP + TN + FP + FN)
The numerator is everything you got right. The denominator is everything, right and wrong combined. The result is a decimal between 0 and 1. Multiply by 100 to express it as a percentage.
Worked Example: Email Spam Filter
Suppose you build a spam filter and test it on 200 emails. You know which ones are actually spam and which are legitimate, so you can compare the filter’s predictions against the truth. Here are the results:
- TP: 50 emails correctly flagged as spam
- TN: 120 emails correctly identified as not spam
- FP: 10 legitimate emails incorrectly flagged as spam
- FN: 20 spam emails that slipped through to the inbox
Plug those numbers in:
Accuracy = (50 + 120) / (50 + 120 + 10 + 20) = 170 / 200 = 0.85
That’s 85% accuracy. Out of 200 emails, the filter classified 170 of them correctly.
Multiclass Example: Three or More Categories
The same logic works when there are more than two possible outcomes. Instead of tracking four confusion matrix cells, you simply count total correct predictions and divide by total predictions.
Say you’re classifying customer support tickets into three categories: billing, technical, and general. Your model processes 45 tickets and gets 37 of them right. The accuracy is 37 / 45 = 0.822, or about 82%. You don’t need separate TP/TN/FP/FN counts for each category to get overall accuracy. Just correct divided by total.
Calculating Accuracy in Python
The scikit-learn library has a built-in function that handles this in one line. You pass in two lists: the true labels and the predicted labels.
from sklearn.metrics import accuracy_score
y_true = [1, 0, 1, 1, 0, 1, 0, 0, 1, 0]
y_pred = [1, 0, 1, 0, 0, 1, 1, 0, 1, 0]
accuracy_score(y_true, y_pred)
This returns 0.8, meaning 8 out of 10 predictions matched. By default, the function returns a fraction. If you set normalize=False, it returns the raw count of correct predictions (8 in this case) instead of the fraction.
Calculating Accuracy in Excel
If your actual values are in column A and your predicted values are in column B (with data in rows 2 through 201 for 200 items), you can use COUNTIF to count how many rows match:
=COUNTIF(A2:A201,B2:B201)/COUNTA(A2:A201)
A simpler approach: add a helper column C where each row checks if the prediction matches. In cell C2, enter =A2=B2, then drag that down. Each cell returns TRUE or FALSE. Then use =COUNTIF(C2:C201,TRUE)/200 to get your accuracy as a decimal.
When Accuracy Is Misleading
Accuracy has a well-known blind spot: imbalanced datasets. If one class appears far more often than the other, a model can score high accuracy by doing almost nothing useful.
Here’s the classic example. Imagine a dataset where only 1% of cases are positive (say, a rare disease). A model that predicts “negative” for every single patient would be correct 99% of the time, giving it 99% accuracy. But it would catch zero actual cases of the disease. That 99% number is technically correct and completely useless.
This is why two other metrics exist alongside accuracy:
- Sensitivity (recall): Of all the actual positive cases, how many did you catch? Formula: TP / (TP + FN). This tells you how good the model is at finding the thing you’re looking for.
- Specificity: Of all the actual negative cases, how many did you correctly rule out? Formula: TN / (TN + FP). This tells you how well the model avoids false alarms.
Two tests can have identical accuracy but very different practical value. A diagnostic test with 50% sensitivity and 100% specificity (75% accuracy) would miss half of all patients who actually have the condition, but it would never incorrectly diagnose a healthy person. A test with 100% sensitivity and 50% specificity (also 75% accuracy) would catch every patient but would falsely alarm half the healthy population. Same accuracy score, completely different use cases.
Choosing Between Accuracy and Other Metrics
Accuracy works well when your classes are roughly balanced (similar numbers of positive and negative cases) and when false positives and false negatives carry similar costs. A model classifying photos of cats versus dogs on a balanced dataset? Accuracy is a perfectly fine metric.
Switch to other metrics when the stakes are uneven. In fraud detection, missing actual fraud (false negative) is far more costly than flagging a legitimate transaction for review (false positive). In medical screening, failing to detect a disease is worse than ordering an extra follow-up test. In these cases, sensitivity or precision gives you more actionable information than accuracy alone.
The best practice for any classification task is to report accuracy alongside precision, recall, and a confusion matrix. Accuracy tells you the big picture, while the other metrics reveal where your model is strong and where it’s failing.

