How Does Lossless Compression Work? Algorithms Explained

Lossless compression shrinks files by finding and eliminating redundancy in data, then perfectly reconstructing the original when you decompress. Unlike lossy compression (which throws away information you hopefully won’t miss), lossless compression guarantees that every single bit comes back identical. It does this through a combination of clever pattern-matching and efficient encoding, and the specific techniques vary depending on the type of data involved.

The Core Idea: Redundancy Is Waste

Most real-world data is full of patterns. In English text, the letter “e” appears far more often than “z.” In a spreadsheet, the same value might repeat across thousands of cells. In a photo of a blue sky, long runs of nearly identical pixel values stretch across the image. All of this repetition means the data contains more bits than it strictly needs to represent the information inside it.

Lossless compression algorithms exploit this redundancy in two broad ways: they find repeated sequences and replace them with short references, and they assign shorter codes to more common elements. Most modern compression tools combine both strategies in sequence.

Finding Repeated Patterns With Sliding Windows

One of the most widely used techniques is called LZ77, developed in 1977 and still at the heart of ZIP files, gzip, and PNG images. It works by scanning through data with a “sliding window,” essentially a buffer that remembers the data you’ve already processed.

As the algorithm reads forward, it looks back through its window for any sequence that matches what’s coming next. When it finds a match, instead of writing out the full sequence again, it writes a compact reference: how far back the match starts, how long it is, and what character comes after it. So if the word “compression” appeared 200 characters ago and it appears again now, the algorithm replaces those 11 characters with something like “go back 200, copy 11 characters, then write the next new character.” That reference takes far fewer bits than spelling the word out again.

This approach works beautifully on any data with recurring patterns, whether that’s repeated words in a document, recurring function calls in source code, or identical pixel sequences in an image. The size of the sliding window determines how far back the algorithm can look for matches. Larger windows find more redundancy but require more memory and processing time.

Shorter Codes for Common Symbols

The second major strategy is called entropy coding, and the most intuitive version is Huffman coding. The idea is simple: instead of using a fixed number of bits for every character (like the 8 bits per character in standard ASCII), you assign shorter codes to characters that appear more often and longer codes to rare ones.

The algorithm starts by counting how frequently each symbol appears in the data. It then builds a binary tree from the bottom up. The two least frequent symbols get merged into a single node, and their combined frequency becomes that node’s weight. This merging repeats until every symbol is part of one tree. Characters that appear often end up near the top of the tree with short paths (and therefore short codes), while rare characters sit deeper with longer codes.

To encode the letter “e” in English text, you might need only 2 or 3 bits. The letter “z” might take 10 or more. On average, across the entire file, this variable-length coding uses significantly fewer bits than a fixed-length scheme. To decompress, the receiver just needs the same tree, which is typically stored in the compressed file’s header.

Arithmetic Coding: Getting Even Closer to Perfect

Huffman coding has one limitation: it can only assign whole numbers of bits to each symbol. If the mathematically ideal code length for a symbol is 1.3 bits, Huffman has to round up to 2. Arithmetic coding solves this by encoding an entire message as a single fractional number between 0 and 1, where the range narrows with each successive symbol based on its probability.

The practical difference can be significant. In one comparative study on text data, Huffman coding compressed a sample to 17 bits with an average of 1.42 bits per symbol, while arithmetic coding compressed the same data to 10 bits at 1.28 bits per symbol, achieving zero redundancy. Arithmetic coding requires more computing power, but for data where symbol frequencies are highly skewed (one symbol appears far more often than others), it squeezes out meaningfully more savings.

How ZIP and PNG Combine Both Strategies

The DEFLATE algorithm, used inside ZIP files, gzip, and PNG images, is a two-stage pipeline. First, it applies LZ77 to find and replace repeated sequences with back-references. Then it takes the output of that step and applies Huffman coding to compress it further. The back-references themselves contain patterns (certain match lengths and distances are more common than others), so Huffman coding can shrink them even more.

DEFLATE can operate in different modes depending on the data. Some blocks use fixed Huffman tables built into the format for speed. Others use custom Huffman tables optimized for that specific block’s statistics. And if a section of data is truly random with no exploitable patterns, DEFLATE can store it uncompressed rather than wasting effort on something that won’t shrink.

Reorganizing Data Before Compressing It

Some algorithms improve compression not by encoding more cleverly, but by rearranging data into a form that’s easier to compress. The Burrows-Wheeler Transform, used in bzip2, is a striking example. It takes a block of text and sorts all its rotations lexicographically, then extracts a transformed string from the result.

The key property of this transformation is that characters sharing the same context (the same surrounding characters in the original text) get grouped together. If “th” appears frequently in English, then every “t” that precedes an “h” in the original will cluster near other such “t” characters in the transformed output. These long runs of repeated characters are then trivially compressed by simpler algorithms. The transform itself doesn’t reduce the size at all. It just reorganizes the data so that the next compression stage works much more effectively. And because it’s a reversible rearrangement, the original can be perfectly reconstructed.

Why No Algorithm Can Compress Everything

There’s a hard mathematical limit on lossless compression, and it’s worth understanding because it explains why compressed files don’t get smaller when you compress them again.

Claude Shannon established in 1948 that every data source has an entropy rate: the minimum average number of bits needed per symbol to represent its information content. For English text, Shannon estimated this at about 2.3 bits per character, compared to the 8 bits per character used in standard encoding. That gap between 8 and 2.3 is the theoretical room for compression. No lossless algorithm can beat the entropy rate, and real algorithms can only approach it.

There’s also a more fundamental constraint. Consider all possible files of exactly 1,000 bits. There are 2^1,000 of them. The number of possible files shorter than 1,000 bits is only 2^1,000 minus 1. Since a lossless compression function must map each unique input to a unique output (otherwise you couldn’t decompress reliably), it’s mathematically impossible to map every 1,000-bit string to a shorter one. There simply aren’t enough shorter strings to go around. This is called the pigeonhole principle, and it proves that any lossless compression algorithm that shrinks some files must necessarily leave others the same size or make them larger.

In practice, this means compression works well on structured, real-world data (text, code, images, databases) and fails on data that’s already compressed or truly random.

Lossless Audio: FLAC and ALAC

Lossless compression for audio works on the same principles but exploits patterns specific to sound. Audio samples change gradually from one to the next, so rather than storing each sample’s absolute value, formats like FLAC and ALAC predict each sample based on previous ones and store only the small prediction errors. These residuals cluster tightly around zero, making them highly compressible with entropy coding.

Both formats typically reduce file sizes by 30 to 60 percent compared to uncompressed WAV, with the exact ratio depending on the complexity of the music. A solo piano recording with lots of silence compresses more than dense electronic music. The decompressed audio is bit-for-bit identical to the original.

Medical Imaging and Lossless Requirements

Lossless compression isn’t just a preference in some fields. It’s a regulatory requirement. The FDA requires that full-field digital mammography data be stored in either uncompressed or losslessly compressed format for long-term archival. Lossy compression is not permitted, because any degradation in image quality could theoretically lead to a missed diagnosis, and the manufacturer could be held liable for patient harm.

Medical images also present unique technical challenges. Most medical imaging uses bit depths between 10 and 16 bits per pixel (compared to 8 for a typical photo), which rules out standard baseline JPEG. The DICOM medical imaging standard supports specific lossless codecs including a lossless JPEG mode that uses predictive coding followed by Huffman encoding, and JPEG-LS, which offers stronger lossless compression performance for grayscale and color medical images.

Modern Algorithms: Speed and Ratio Tradeoffs

Newer algorithms like Zstandard (developed at Meta) have pushed the boundaries of what’s practical. Zstandard is designed to be configurable across a wide range of compression levels, from extremely fast modes that sacrifice some compression ratio, to slower modes that squeeze out every possible byte. Its decompression speed stays roughly the same regardless of what compression level was used, which is a useful property shared by most LZ-based algorithms: you pay the cost once when compressing, and everyone who reads the data later benefits from fast decompression.

Zstandard also supports dictionary-based compression, where you train the algorithm on sample data that resembles what you’ll be compressing. This is particularly useful for small files (like individual JSON records in an API), where there isn’t enough data within a single file for the algorithm to learn patterns. By preloading a dictionary of common patterns, compression ratios on small data improve dramatically. This training-based approach represents a broader trend in modern compression: adapting the algorithm to the specific structure of your data rather than relying on general-purpose defaults.