←back to thread

How the cochlea computes (2024)

(www.dissonances.blog)
475 points izhak | 6 comments | | HN request time: 0.255s | source | bottom
Show context
edbaskerville ◴[] No.45762928[source]
To summarize: the ear does not do a Fourier transform, but it does do a time-localized frequency-domain transform akin to wavelets (specifically, intermediate between wavelet and Gabor transforms). It does this because the sounds processed by the ear are often localized in time.

The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.

A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

replies(12): >>45763026 #>>45763057 #>>45763066 #>>45763124 #>>45763139 #>>45763700 #>>45763804 #>>45764016 #>>45764339 #>>45764582 #>>45765101 #>>45765398 #
1. lgas ◴[] No.45764339[source]
> It does this because the sounds processed by the ear are often localized in time.

What would it mean for a sound to not be localized in time?

replies(4): >>45764466 #>>45764524 #>>45764731 #>>45768788 #
2. littlestymaar ◴[] No.45764466[source]
A continuous sinusoidal sound, I guess?
3. hansvm ◴[] No.45764524[source]
It would look like a Fourier transform ;)

Zooming in to cartoonish levels might drive the point home a bit. Suppose you have sound waves

  |---------|---------|---------|
What is the frequency exactly 1/3 the way between the first two wave peaks? It's a nonsensical question. The frequency relates to the time delta between peaks, and looking locally at a sufficiently small region of time gives no information about that phenomenon.

Let's zoom out a bit. What's the frequency over a longer period of time, capturing a few peaks?

Well...if you know there is only one frequency then you can do some math to figure it out, but as soon as you might be describing a mix of frequencies you suddenly, again, potentially don't have enough information.

That lack of information manifests in a few ways. The exact math (Shannon's theorems?) suggests some things, but the language involved mismatches with human perception sufficiently that people get burned trying to apply it too directly. E.g., a bass beat with a bit of clock skew is very different from a bass beat as far as a careless decomposition is concerned, but it's likely not observable by a human listener.

Not being localized in time means* you look at longer horizons, considering more and more of those interactions. Instead of the beat of a 4/4 song meaning that the frequency changes at discrete intervals, it means that there's a larger, over-arching pattern capturing "the frequency distribution" of the entire song.

*Truly time-nonlocalized sound is of course impossible, so I'm giving some reasonable interpretation.

replies(1): >>45764999 #
4. xeonmc ◴[] No.45764731[source]
Means that it is a broad spectrum signal.

Imagine the dissonant sound of hitting a trashcan.

Now imagine the sound of pressing down all 88 keys on a piano simultaneously.

Do they sound similar in your head?

The localization is located at where the phase of all frequency components are aligned coherently construct into a pulse, while further down in time their phases are misaligned and cancel each other out.

5. jancsika ◴[] No.45764999[source]
> It's a nonsensical question.

Are you talking about a discrete signal or a continuous signal?

6. kragen ◴[] No.45768788[source]
The 50-cycle hum of the transformer outside your house. Tinnitus. The ≈15kHz horizontal scanning frequency whine of a CRT TV you used to be able to hear when you were a kid.

Of course, none of these are completely nonlocalized in time. Sooner or later there will be a blackout and the transformer will go silent. But it's a lot less localized than the chirp of a bird.