Is KNN a Supervised or Unsupervised Algorithm?

K-nearest neighbors (KNN) is a supervised learning algorithm. It requires labeled training data to work, using those known labels to classify new data points or predict values based on what their closest neighbors look like. The confusion is understandable because KNN shares a name structure with K-means, which is unsupervised, but the two algorithms do fundamentally different things.

Why KNN Is Supervised

Supervised learning means an algorithm learns from data that already has correct answers attached. Every data point in the training set comes with a label: “spam” or “not spam,” “cat” or “dog,” a numerical value like a house price. KNN uses those existing labels to decide what label a new, unseen data point should get.

When you feed KNN a new data point, it finds the “k” closest data points in the training set (where k is a number you choose, like 3 or 5), looks at their labels, and assigns the most common label to the new point. For classification, this works like a majority vote among neighbors. If you set k to 5 and three of the five nearest neighbors are labeled “cat,” the new point gets labeled “cat.” For regression problems, KNN averages the values of the nearest neighbors instead of voting.

Without labels, this process breaks down entirely. KNN has no mechanism for discovering groups on its own. It needs to be told what the groups are before it can assign anything to them.

The “Lazy Learning” Distinction

KNN is unusual among supervised algorithms because it has no real training phase. Most supervised models, like logistic regression or neural networks, spend time learning patterns and adjusting internal parameters during training. KNN skips this step. It simply memorizes the entire training dataset and does all its work at prediction time.

This is why KNN is called a “lazy learner.” It doesn’t build an internal model of the data. Every time you ask it to classify something new, it recalculates distances to every stored data point from scratch. The upside is instant “training.” The downside is that predictions get slower as your dataset grows, because each new prediction requires comparing against every point in memory.

KNN vs. K-Means: The Common Mix-Up

The most frequent source of confusion is the similarity between “KNN” and “K-means.” They sound related but operate in opposite ways. KNN is supervised and used for classification or regression. K-means is unsupervised and used for clustering.

K-means takes unlabeled data and groups it into k clusters based purely on similarity, with no prior knowledge of what those groups should be. It discovers structure in the data. KNN, by contrast, already knows the structure (the labels) and uses it to classify new points. In K-means, the “k” refers to the number of clusters you want. In KNN, the “k” refers to the number of neighbors consulted for each prediction.

How KNN Measures “Nearest”

KNN relies on distance to define which neighbors are closest. The most common approach is Euclidean distance, which is the straight-line distance between two points, the same formula you’d use to measure distance on a map. Other options include Manhattan distance, which measures distance along grid lines (like walking city blocks), and Minkowski distance, which generalizes both of these.

Because KNN depends entirely on distance, the scale of your features matters enormously. If one feature ranges from 0 to 1,000 and another ranges from 1 to 10, the first feature will dominate every distance calculation. The smaller feature gets effectively ignored. Scikit-learn’s documentation demonstrates this clearly: when classifying wine samples using two features (“proline,” which ranges up to 1,000, and “hue,” which ranges up to 10), the unscaled model produces a completely different decision boundary than the scaled one. Scaling both features to the same range ensures each one contributes proportionally to the distance calculation.

Choosing the Right K Value

The value of k has a direct impact on how well KNN performs. A very small k (like 1 or 2) makes the model sensitive to noise. A single mislabeled point in your training data can throw off predictions for anything nearby. A very large k smooths out noise but can blur the boundaries between classes, especially when classes overlap or one class is much larger than another.

The most common approach for finding a good k is the elbow method: you test multiple k values, plot the error rate for each, and look for the point where increasing k stops meaningfully improving performance. That bend in the curve is your target. Odd values of k are generally preferred for binary classification to avoid ties in the voting process.

When KNN Touches Unsupervised Territory

While KNN itself is supervised, the concept of “k-nearest neighbors” as a distance-based framework shows up in some unsupervised applications. Anomaly detection is one example. Researchers have adapted the k-nearest neighbor distance concept to identify outliers: data points whose nearest neighbors are unusually far away. These adaptations calculate an anomaly score based on how isolated a point is relative to its neighbors, without needing labeled examples of “normal” versus “abnormal.”

These are extensions of the core idea, not the KNN algorithm itself. When someone refers to “KNN” in a machine learning context, they mean the supervised classifier that uses labeled data and neighbor voting to make predictions.