←back to thread

How the cochlea computes (2024)

(www.dissonances.blog)
475 points izhak | 2 comments | | HN request time: 0.001s | source
Show context
edbaskerville ◴[] No.45762928[source]
To summarize: the ear does not do a Fourier transform, but it does do a time-localized frequency-domain transform akin to wavelets (specifically, intermediate between wavelet and Gabor transforms). It does this because the sounds processed by the ear are often localized in time.

The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.

A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

replies(12): >>45763026 #>>45763057 #>>45763066 #>>45763124 #>>45763139 #>>45763700 #>>45763804 #>>45764016 #>>45764339 #>>45764582 #>>45765101 #>>45765398 #
a-dub ◴[] No.45763124[source]
> At high frequencies, frequency resolution is sacrificed for temporal resolution, and vice versa at low frequencies.

this is the time-frequency uncertainty principle. intuitively it can be understood by thinking about wavelength. the more stretched out the waveform is in time, the more of it you need to see in order to have a good representation of its frequency, but the more of it you see, the less precise you can be about where exactly it is.

> but it does do a time-localized frequency-domain transform akin to wavelets

maybe easier to conceive of first as an arbitrarily defined filter bank based on physiological results rather than trying to jump directly to some neatly defined set of orthogonal basis functions. additionally, orthogonal basis functions cannot, by definition, capture things like masking effects.

> A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

(4) size of the animal.

notably: some smaller creatures have supersonic vocalization and sensory capability, sometimes this is hypothesized to complement visual perception for avoiding predators, it also could just have a lot to do with the fact that, well, they have tiny articulators and tiny vocalizations!

replies(1): >>45764189 #
Terr_ ◴[] No.45764189[source]
> it also could just have a lot to do with the fact that, well, they have tiny articulators and tiny vocalizations!

Now I'm imagining some alien shrew with vocal-cords (or syrinx, or whatever) that runs the entire length of its body, just so that it can emit lower-frequency noises for some reason.

replies(3): >>45764677 #>>45765058 #>>45768799 #
bragr ◴[] No.45764677[source]
Well without the humorous size difference, this is basically what whales and elephants do for long distance communication.
replies(1): >>45765641 #
1. Terr_ ◴[] No.45765641[source]
Was playing around with a fundamental frequency calculator [0] to associate certain sizes to hertz, then using a tone-generator [1] to get a subjective idea of what it'd sound like.

Though of course, nature has plenty of other tricks, like how Koalas can go down to ~27hz. [2]

[0] https://acousticalengineer.com/fundamental-frequency-calcula...

[1] https://www.szynalski.com/tone-generator/

[2] https://www.nature.com/articles/nature.2013.14275

replies(1): >>45766503 #
2. fuzzfactor ◴[] No.45766503[source]
How long would a Dachshund have to be for it to sound like a 60 kilo Great Dane?