How Is Sound Conducted in the Inner Ear?

Sound conduction in the inner ear is a fluid-based process. After sound waves vibrate the eardrum and travel through three tiny bones in the middle ear, the last bone (the stapes) pushes against a membrane-covered opening called the oval window. This creates pressure waves in the fluid-filled chambers of the cochlea, a snail-shaped structure about the size of a pea. From there, the cochlea converts those pressure waves into electrical signals that travel to the brain.

How Sound Enters the Cochlea

The stapes footplate rocks against the oval window like a piston, displacing fluid inside the cochlea’s upper chamber (the scala vestibuli). For sounds up to about 130 decibels, the pressure inside this chamber increases in direct proportion to how far the stapes moves. Push twice as hard, get twice the pressure. This linear relationship is what allows the inner ear to faithfully represent a wide range of sound levels.

The cochlea is encased in dense bone, so the fluid inside it has almost nowhere to go. That’s where the round window comes in. This second membrane-covered opening sits at the base of the cochlea’s lower chamber and acts as a pressure vent. When the stapes pushes inward at the oval window, the round window bulges outward to compensate. Without this release valve, the fluid couldn’t move and the structures inside the cochlea couldn’t vibrate. Experiments that block the round window confirm this: cochlear responses to sound drop significantly.

The Cochlea’s Three Fluid-Filled Chambers

The cochlea contains three parallel channels that spiral from base to tip. The upper channel (scala vestibuli) and lower channel (scala tympani) are filled with perilymph, a fluid similar in composition to most body fluids, with high sodium and low potassium. Sandwiched between them is the scala media, filled with endolymph. Endolymph is chemically unusual: it contains about 140 mEq/L of potassium and only 15 mEq/L of sodium, essentially the reverse of what you’d find in most fluids outside of cells. This potassium-rich environment is critical for generating the electrical signals that encode sound.

When the stapes creates a pressure wave in the upper chamber, it travels through the fluid, crosses into the lower chamber, and is absorbed by the round window. Along the way, it passes through and deflects the structures sitting between these chambers, including the basilar membrane and the organ of Corti, where the actual detection of sound takes place.

How the Basilar Membrane Sorts Frequencies

The basilar membrane runs the full length of the cochlea, roughly 35 millimeters in humans. Its physical properties change gradually from one end to the other. At the base (near the oval window), the membrane is narrow and stiff. At the apex (the tip of the spiral), it’s wider and more flexible. This gradient means different locations vibrate most strongly in response to different sound frequencies. High-pitched sounds cause maximum vibration near the stiff base. Low-pitched sounds travel further and peak near the floppy apex.

This arrangement, called tonotopy, is what allows you to distinguish a whistle from a bass drum. Each frequency essentially has its own address along the cochlea. The stiffness gradient develops during early life, and its maturation is what gives the cochlea its ability to finely tune which frequencies activate which regions.

The Organ of Corti: Where Vibration Becomes Signal

Sitting on top of the basilar membrane is the organ of Corti, a ribbon of specialized cells that runs the length of the cochlea. It contains two types of sensory cells (hair cells) along with supporting cells that hold everything in place. Draped over the top of the hair cells is the tectorial membrane, a gel-like flap. The tallest hair-like projections (stereocilia) of the outer hair cells are physically embedded in this tectorial membrane.

When the basilar membrane vibrates, it shifts relative to the tectorial membrane above it. This shearing motion bends the stereocilia on the hair cells. The bending is the pivotal moment: it’s the point where mechanical vibration starts to become an electrical signal.

How Hair Cells Convert Motion Into Electricity

Each hair cell has a bundle of stereocilia arranged in a staircase pattern, with rows of increasing height. Tiny filaments called tip links connect each shorter stereocilium to its taller neighbor, like rungs on a ladder laid at an angle. When the bundle bends toward the tallest row, these tip links pull taut and physically yank open ion channels at their base. Potassium from the potassium-rich endolymph floods into the cell, generating an electrical voltage change.

Bending the bundle in the opposite direction slackens the tip links and closes the channels. If the tip links are broken or missing, the hair cell loses its ability to detect sound entirely. Direct mechanical pulling on a single tip link is enough to pop open a channel, confirming that this is a purely mechanical gating system with no chemical middleman.

Outer Hair Cells Amplify the Signal

The cochlea has two populations of hair cells with very different jobs. The three rows of outer hair cells act as biological amplifiers. They contain a motor protein called prestin that changes shape in response to voltage. When these cells are stimulated, they physically lengthen and shorten at the frequency of the incoming sound, sometimes thousands of times per second. This rapid shape-shifting boosts the vibration of the basilar membrane locally, sharpening the cochlea’s ability to pick out specific frequencies and detect very quiet sounds.

Losing this amplification system has dramatic consequences. When prestin is knocked out or damaged, hearing sensitivity drops severely. This is one reason noise damage and aging affect hearing so profoundly: outer hair cells are among the first casualties, and humans cannot regenerate them.

Inner Hair Cells Send the Message to the Brain

The single row of inner hair cells does the actual job of reporting sound to the brain. When their stereocilia bend and ion channels open, the resulting voltage change triggers the release of glutamate, a neurotransmitter, at specialized junctions called ribbon synapses. These synapses are built for speed and precision. Each one has a large cluster of glutamate receptors on the receiving end, so the release of even a single tiny packet of neurotransmitter can be enough to fire an electrical impulse in the connected nerve fiber.

The rate of glutamate release tracks the inner hair cell’s voltage, which in turn tracks how much the basilar membrane is vibrating at that location. Louder sounds cause more vibration, more ion channel opening, more glutamate release, and faster firing of the auditory nerve. This is how sound intensity gets encoded: as the firing rate of nerve fibers in the spiral ganglion, the cluster of neurons whose axons bundle together to form the auditory nerve heading to the brain.

The Full Sequence in Brief

  • Stapes pushes on the oval window, creating a pressure wave in the cochlear fluid.
  • The round window bulges outward, allowing the fluid to actually move.
  • The basilar membrane vibrates at a location determined by the sound’s frequency.
  • Stereocilia on hair cells bend as the basilar membrane and tectorial membrane slide past each other.
  • Tip links pull open ion channels, letting potassium rush into the hair cell.
  • Outer hair cells amplify the vibration by changing shape in response to voltage.
  • Inner hair cells release glutamate onto auditory nerve fibers, encoding sound as electrical impulses sent to the brain.

The entire process, from stapes movement to nerve impulse, takes only microseconds. It’s precise enough to distinguish thousands of frequencies simultaneously and sensitive enough to detect air vibrations smaller than the diameter of an atom.