How the cochlea computes (2024)

(www.dissonances.blog)

475 points izhak | 1 comments | 30 Oct 25 17:01 UTC | HN request time: 0s | source

Show context

edbaskerville ◴[30 Oct 25 17:52 UTC] No.45762928[source]▶

To summarize: the ear does not do a Fourier transform, but it does do a time-localized frequency-domain transform akin to wavelets (specifically, intermediate between wavelet and Gabor transforms). It does this because the sounds processed by the ear are often localized in time.

The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.

A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

replies(12): >>45763026 #>>45763057 #>>45763066 #>>45763124 #>>45763139 #>>45763700 #>>45763804 #>>45764016 #>>45764339 #>>45764582 #>>45765101 #>>45765398 #

patrickthebold ◴[30 Oct 25 19:57 UTC] No.45764582[source]▶

>>45762928 #

I think I might be missing something basic, but if you actually wanted to do a Fourier transform on the sound hitting your ear, wouldn't you need to wait your entire lifetime to compute it? It seems pretty clear that's not what is happening, since you can actually hear things as they happen.

replies(4): >>45764633 #>>45764755 #>>45764761 #>>45764952 #

bonoboTP ◴[30 Oct 25 20:11 UTC] No.45764761[source]▶

>>45764582 #

Yes, for the vanilla Fourier transform you have to integrate from negative to positive infinity. But more practically you can put put a temporally finite-support window function on it, so you only analyze a part of it. Whenever you see a 2d spectrogram image in audio editing software, where the audio engineer can suppress a certain range of frequencies in a certain time period they use something like this.

It's called the short-time Fourier transform (STFT).

https://en.wikipedia.org/wiki/Short-time_Fourier_transform

replies(1): >>45768746 #

1. kragen ◴[31 Oct 25 05:37 UTC] No.45768746[source]▶

>>45764761 #

Yeah. But a really annoying thing about the STFT is that its temporal resolution is independent of frequency, so you either have to have shitty temporal resolution at high frequencies or shitty frequency resolution at low ones, compared to the human ear. So in Audacity I keep having to switch back and forth between window sizes.

↑