How Does Hashing Work and Why Can’t It Be Reversed?

Hashing is a process that takes any input, whether it’s a single word or an entire book, and converts it into a fixed-size string of characters. Think of it as a fingerprint for data: no matter how large or small the original input, the output (called a “hash” or “digest”) is always the same length. A single letter and a 500-page novel both produce, say, a 64-character hash when run through the same algorithm. This makes hashing essential for everything from storing passwords to verifying downloaded files to powering the data structures that keep software running fast.

The Basic Mechanism

A hash function takes input of any length and maps it to an output of a specific, fixed length. If you hash the word “hello,” you get a string of characters. If you hash the complete works of Shakespeare, you get a string of the exact same length. The process is deterministic: the same input always produces the same output, every single time.

Under the hood, many hash functions work by breaking a long input into smaller blocks, processing each block through a series of mathematical operations, then combining the results. One approach, called tree-hashing, splits a message into fixed-size chunks, hashes each chunk individually, concatenates those hashes into a new message, and repeats until the result fits the desired output size. Other designs process blocks sequentially, feeding each block’s result into the next round of computation. Either way, the goal is the same: compress arbitrary-length data into a short, consistent output.

Hashing is a one-way operation. You can easily compute the hash of any input, but you can’t reverse-engineer the original input from the hash. This asymmetry is what makes hashing useful for security.

What Makes a Hash Function Secure

Not all hash functions need to be secure, but when they’re used for cryptography, three properties matter. NIST, the U.S. agency that sets cryptographic standards, defines them this way:

  • Pre-image resistance: Given a hash output, it should be computationally infeasible to figure out what input produced it. This is the “one-way” property.
  • Second pre-image resistance: Given one specific input, it should be infeasible to find a different input that produces the same hash.
  • Collision resistance: It should be infeasible to find any two distinct inputs that produce the same output.

These three properties build on each other. Collision resistance is the hardest to achieve because the attacker gets to choose both inputs freely. If an attacker can generate two different documents with the same hash, they could potentially swap one for the other without detection.

Hashing in Data Structures

Outside of security, hashing powers one of the most common structures in programming: the hash table. A hash table stores data in an array, using a hash function to decide where each item goes. You feed in a key (like someone’s name), the hash function converts it to a number, and that number tells you which slot in the array holds the associated value (like their phone number). This lets you look up data almost instantly instead of searching through every entry one by one.

The catch is collisions. Since a hash function maps a huge range of possible inputs to a limited number of array slots, two different keys will sometimes land on the same slot. Hash tables handle this in a few ways:

  • Separate chaining: Each slot in the array stores a linked list instead of a single value. When two keys hash to the same slot, both get added to that slot’s list. Insertion stays fast because new entries go to the front of the list, but lookups slow down if many items pile up in one slot.
  • Linear probing: When a collision occurs, the system checks the next slot in the array, then the next, until it finds an empty one. All data stays inside the array itself. The downside is that clusters of filled slots tend to grow, making future collisions more likely in that area.
  • Quadratic probing: Similar to linear probing, but instead of checking one slot ahead at a time, the system jumps forward by increasing amounts (1, then 3, then 6, and so on). This spreads entries out more evenly and reduces the clustering problem.

How Password Hashing Works

When you create an account on a well-designed website, your password is never stored directly. Instead, the system hashes your password and stores only the hash. When you log in later, it hashes whatever you type and compares that hash to the stored one. If they match, you’re in. The actual password never sits in a database.

This means that even if attackers steal the entire database, they get hashes, not passwords. But basic hashing alone has a weakness. Since the same password always produces the same hash, attackers can pre-compute massive lookup tables (called rainbow tables) mapping common passwords to their hashes, then simply look up any stolen hash to find the original password.

Salts and Peppers

A salt is a random string generated uniquely for each user and appended to their password before hashing. Because every user gets a different salt, two people with the identical password end up with completely different hashes. A 64-bit salt multiplies the size of a rainbow table by roughly 1.8 × 10¹⁹ for every single password entry, making pre-computed tables essentially impossible to build.

Salts are stored in the database alongside the hashes, though, so an attacker who compromises the database has both. They can still try brute-force attacks, guessing passwords one at a time and hashing each guess with the known salt. This is where a pepper helps. A pepper is a secret value stored separately from the database, often in application configuration or a hardware security module. It gets combined with the password and salt before hashing. Even with the hash and salt in hand, an attacker who doesn’t have the pepper lacks enough information to brute-force effectively. A 64-bit pepper introduces so much additional complexity that brute-forcing becomes unreasonable.

Common Algorithms and Their Status

MD5 was one of the most widely used hash algorithms for years, producing a 128-bit output. Theoretical weaknesses appeared as early as 1996, and by 2004 researchers demonstrated practical collision attacks. By 2005, attackers could generate colliding security certificates. Carnegie Mellon’s CERT Coordination Center now states plainly: MD5 should be considered cryptographically broken and unsuitable for further use.

SHA-1, which produces a 160-bit hash, followed a similar trajectory. Google demonstrated a practical collision in 2017, and major browsers stopped accepting SHA-1 certificates around the same time.

The current standard for most applications is SHA-256, part of the SHA-2 family, which produces a 256-bit output. SHA-3, a completely different design adopted by NIST as a backup standard, offers similar security with a different internal structure. Both are approved for digital signatures, message authentication, and key derivation.

Speed Differences Between Algorithms

For cryptographic security, slower is sometimes better (it makes brute-force attacks harder). But for non-security uses like checksums and data structures, speed matters. Performance benchmarks on commodity hardware show dramatic differences. In single-threaded tests, BLAKE3 generated about 17.5 million hashes per second compared to roughly 1.7 million for SHA-256 and 1.1 million for SHA-512. The gap widens with parallelism: at 64 threads, BLAKE3 reached over 370 million hashes per second while SHA-256 stayed under 600,000. BLAKE3 was designed from the ground up to take advantage of modern multi-core processors, which accounts for most of that difference.

For password hashing specifically, general-purpose speed is actually a liability. Algorithms designed for passwords, like bcrypt and Argon2, are intentionally slow and memory-intensive. They’re built so that hashing a single password takes a fraction of a second (unnoticeable to a legitimate user) but makes trying billions of guesses prohibitively expensive for an attacker.

Everyday Uses You Encounter

Hashing is quietly everywhere. When you download software and see a “SHA-256 checksum” listed on the download page, that’s a hash of the original file. You can hash your downloaded copy and compare. If the hashes match, the file wasn’t corrupted or tampered with during transfer.

Version control systems like Git hash every change you commit, creating a unique identifier for each snapshot of your code. Blockchain systems chain hashes together so that altering any past record would change every subsequent hash, making tampering obvious. Databases use hash indexes to speed up lookups. De-duplication systems hash file contents to identify duplicates without comparing files byte by byte.

In all of these cases, the core principle is the same: take data of any size, produce a compact fingerprint, and use that fingerprint to verify, identify, or organize information efficiently.