What Is Audio Distortion? Causes, Types, and Fixes

Audio distortion is any unwanted change to a sound signal between its source and its destination. When a signal passes through a microphone, amplifier, speaker, or digital processor, the output should ideally be a perfect copy of the input. Distortion is what happens when it isn’t. The signal’s waveform gets altered, adding frequencies that weren’t in the original sound, flattening peaks, or shifting the timing of certain frequencies relative to others.

Some distortion is subtle enough that you’d never notice it. Other types are immediately obvious: a crackling voice memo, a blown-out guitar amp, or that harsh buzzing when you turn a Bluetooth speaker up too loud. Understanding the different forms of distortion helps explain why some audio sounds clean and some sounds broken.

How Distortion Happens

Every audio device in a signal chain, whether it’s a cable, a preamp, or a streaming codec, is supposed to be “linear.” A perfectly linear device has exact proportionality between input and output: double the input, and the output doubles in the same way. On a graph, this relationship is a straight line. Real-world devices are never perfectly linear. Components have physical limits, circuits have tolerances, and digital systems have fixed numerical ceilings.

When a signal passes through a nonlinear system, the output no longer matches the input’s shape. The waveform gets corrupted. The less linear the system, the more severe the distortion. This nonlinearity is the root cause of nearly every type of audio distortion, from the warm overdrive of a tube amplifier to the harsh crackle of a clipped digital recording.

Harmonic Distortion

The most common type of distortion adds new frequencies called harmonics to the original signal. If you play a pure tone at 440 Hz (the note A above middle C), a nonlinear system might add energy at 880 Hz, 1,320 Hz, 1,760 Hz, and so on. These are integer multiples of the original frequency: 2x, 3x, 4x. They weren’t in the original signal, but the system’s nonlinearity creates them.

Engineers measure this with a value called Total Harmonic Distortion, or THD. It’s expressed as a percentage representing how much energy exists in those extra harmonics compared to the original frequency. A THD of 0.01% means the harmonics are vanishingly small. A THD of 10% means you’d clearly hear the added overtones. Calculations typically account for harmonics up to the 10th order.

Not all harmonics sound equally unpleasant. Even-order harmonics (the 2nd, 4th, 6th multiples) tend to sound rich and warm, which is why tube amplifiers and analog tape, which naturally produce even-order harmonics, are prized in music production. Odd-order harmonics (3rd, 5th, 7th) have an edgier, more aggressive quality. Heavy odd-order distortion is what makes a sound feel harsh or grating. This distinction is why some distortion is deliberately added to recordings for texture while other distortion is carefully avoided.

Intermodulation Distortion

Harmonic distortion involves one frequency generating multiples of itself. Intermodulation distortion (IMD) is more complex and typically more unpleasant. It occurs when two or more frequencies pass through a nonlinear system simultaneously, which is essentially every real-world audio situation since music and speech contain many frequencies at once.

The system creates new frequencies at the sums and differences of the originals and their multiples. If two tones at 1,000 Hz and 1,500 Hz enter a nonlinear device, second-order intermodulation products appear at 2,500 Hz (the sum), 500 Hz (the difference), and at double each original frequency. Third-order products get even more complicated, appearing at frequencies like 2,500 Hz (2 × 1,000 + 500) and 500 Hz (2 × 1,000 − 1,500).

What makes IMD particularly problematic is that these new frequencies are not musically related to the originals. Harmonics at least share a musical relationship with the fundamental tone. Intermodulation products are often dissonant, clashing with the original signal in ways your ear perceives as muddy or harsh. This is why a speaker or amplifier might sound clean on a single sustained note but fall apart on a complex orchestral passage or a loud chord.

Clipping: Analog vs. Digital

Clipping is distortion that occurs when a signal exceeds the maximum level a system can handle. It’s called clipping because the peaks of the waveform are literally clipped off, turning smooth curves into flat plateaus. This is one of the most audible and recognizable forms of distortion.

Analog and digital systems clip in very different ways. In analog electronics like tube amplifiers or tape machines, clipping is relatively forgiving. The signal can technically exceed the system’s rated maximum, but the output waveform gets progressively “rounded” rather than abruptly cut off. This soft transition from clean to distorted is part of what gives analog gear its character, and it’s why guitarists deliberately push tube amps into clipping for that classic overdrive sound.

Digital clipping is a completely different experience. Digital audio represents sound as discrete numerical values, and there is a hard ceiling: the maximum number the system can store. Unlike voltages in a wire, there is no going past this ceiling. When a digital signal exceeds the maximum, the waveform hits a perfectly flat plateau. The transition from clean to clipped is abrupt and harsh, producing an immediately unpleasant crackling or buzzing. This is the sound you hear when someone records a voice memo too close to a microphone or when a podcast host’s levels are set too high.

Aliasing in Digital Audio

Digital audio works by taking thousands of snapshots (samples) of a sound wave every second. To accurately capture a frequency, the system needs to sample at least twice as fast as that frequency. This requirement is called the Nyquist criterion. Standard CD-quality audio samples at 44,100 times per second, which can accurately capture frequencies up to about 22,050 Hz, just above the upper limit of human hearing.

When the sampling rate is too low for the frequencies present, something called aliasing occurs. High-frequency components that can’t be properly captured “fold” down into lower frequencies, appearing as phantom tones that weren’t in the original signal. The aliased frequency equals the difference between the original frequency and the nearest multiple of the sampling rate. These artifacts sound metallic or robotic and are especially noticeable on transients like cymbals or consonant sounds in speech. Modern audio equipment uses filters to remove frequencies above the Nyquist limit before sampling, which is why aliasing is rare in well-designed systems but still crops up in cheap converters or improperly configured recording setups.

Phase Distortion

Phase distortion is subtler than the types above, and many listeners can’t consciously identify it. It occurs when different frequencies in a signal are delayed by different amounts as they pass through a system. A uniform delay, where every frequency is shifted by the same amount of time, isn’t distortion at all. Your ear can’t distinguish a signal that arrived 5 milliseconds late if every part of it arrived equally late. But when low frequencies are delayed differently than high frequencies, the shape of the waveform changes even though the volume at each frequency stays the same.

Phase distortion is most commonly introduced by filters, crossovers in speaker systems, and certain types of equalization. In practice, it can make stereo imaging feel smeared or reduce the sense of punch and clarity in transient-heavy material like drums. Whether phase distortion is audible depends heavily on how severe it is and the complexity of the audio material.

Preventing Unwanted Distortion

The most effective way to prevent distortion in any audio setup is proper gain staging: controlling the signal level at every point in the chain so that no single device is pushed past its comfortable operating range. Each piece of equipment, whether hardware or software, has an optimal input range. Too hot, and you get clipping. Too quiet, and you raise the noise floor when you amplify later.

For digital recording, a good target level is around -12 dBFS on your meter. This leaves enough headroom above the signal to accommodate unexpected peaks or later processing like equalization that might boost certain frequencies. If you compress a signal and reduce its overall level, you should boost the output of the compressor to compensate, keeping the signal in that healthy middle range as it moves to the next stage.

These same principles apply whether you’re recording a podcast with a USB microphone, mixing music in software, or setting up a PA system for a live event. At every stage, the question is the same: is the signal loud enough to stay well above the noise, but quiet enough that it never bumps into the ceiling? Getting that balance right at each link in the chain is what separates clean, professional-sounding audio from audio that crackles, buzzes, or sounds strangely harsh.