←back to thread

How the cochlea computes (2024)

(www.dissonances.blog)
475 points izhak | 9 comments | | HN request time: 0.428s | source | bottom
Show context
edbaskerville ◴[] No.45762928[source]
To summarize: the ear does not do a Fourier transform, but it does do a time-localized frequency-domain transform akin to wavelets (specifically, intermediate between wavelet and Gabor transforms). It does this because the sounds processed by the ear are often localized in time.

The article also describes a theory that human speech evolved to occupy an unoccupied space in frequency vs. envelope duration space. It makes no explicit connection between that fact and the type of transform the ear does—but one would suspect that the specific characteristics of the human cochlea might be tuned to human speech while still being able to process environmental and animal sounds sufficiently well.

A more complicated hypothesis off the top of my head: the location of human speech in frequency/envelope is a tradeoff between (1) occupying an unfilled niche in sound space; (2) optimal information density taking brain processing speed into account; and (3) evolutionary constraints on physiology of sound production and hearing.

replies(12): >>45763026 #>>45763057 #>>45763066 #>>45763124 #>>45763139 #>>45763700 #>>45763804 #>>45764016 #>>45764339 #>>45764582 #>>45765101 #>>45765398 #
crazygringo ◴[] No.45765398[source]
Yeah, this article feels like it's very much setting up a ridiculous strawman.

Nobody who knows anything about signal processing has ever suggested that the ear performs a Fourier transform across infinite time.

But the ear does perform something very much akin to the FFT (fast Fourier transform), turning discrete samples into intensities at frequencies -- which is, of course, what any reasonable person means when they say the ear does a Fourier transform.

This article suggests it's accomplished by something between wavelet and Gabor. Which, yes, is not exactly a Fourier transform -- but it's producing something that is about 95-99% the same in the end.

And again, nobody would ever suggest the ear was performing the exact math that the FFT does, down to the last decimal point. But these filters still work essentially the same way as the FFT in terms of how they respond to a given frequency, it's really just how they're windowed.

So if anyone just wants a simple explanation, I would say yes the ear does a Fourier transform. A discrete one with windowing.

replies(3): >>45766343 #>>45767588 #>>45768701 #
1. anyfoo ◴[] No.45766343[source]
Since we're being pedantic, there is some confusion of ideas here (even though you do make a valid overall point), and the strawman may not be as ridiculous.

First, I think when you say FFT, you mean DFT. A Fourier transform is both non-discrete and infinite in time. A DTFT (discrete time fourier transform) is discrete, i.e. using samples, but infinite. A DFT (discrete fourier transform) is both finite (analyzed data has a start and an end) and discrete. An FFT is effectively an implementation of a DFT, and there is nothing indicating to me that hearing is in any way specifically related to how the FFT computes a DFT.

But more importantly, I'm not sure DFT fits at all? This is an analog, real-world physical process, so where is it discrete, i.e. how does the ear capture samples?

I think, purely based upon its "mode", what's happening is more akin to a Fourier series, which is the missing fourth category completing (FT, DTFT, DFT): Continuous (non-discrete), but finite or rather periodic in time.

But secondly, unlike Gabor transforms, wavelet transforms are specifically not just windowed Fourier anythings (whether FT/FS/DFT/DTFT). Those would commonly be called "short-time Fourier transforms" (STFT, existing again in discrete and non-discrete variants), and the article straight up mentions that they don't fit either in its footnotes.

Wavelet transforms use an entirely different shape (e.g. a haar wavelet) that is shifted and stretched for analysis, instead of windowed sinusoids over a windowed signal.

And I think those distinctions are what the article actually wanted to touch upon.

replies(1): >>45766722 #
2. actionfromafar ◴[] No.45766722[source]
Don’t neurons fire in bursts? That’s sort of discrete I guess.
replies(4): >>45766907 #>>45767028 #>>45768439 #>>45768720 #
3. smallnix ◴[] No.45766907[source]
I was also thinking of refractory periods with neurotransmitters. But I don't know much about this.
replies(1): >>45767032 #
4. anyfoo ◴[] No.45767028[source]
Even if they do (and I honestly have no idea), isn't it the frequency, i.e. the output of the basilar membrane in the ear, and not a sample in time of the actual sound wave which would correspond to a short-time frequency transform, that gets sampled here?

And the basilar membrane seems like a pretty un-discrete (in time, not in frequency) process to me. But I'm not 100% sure.

Sure, if you go small enough, you end up with discrete structures sooner or later (molecules, atoms, quantum if you go far down enough and everything breaks apart anyway), but without knowing anything, the sensitivity of this whole process still seems better modeled as continuous rather than discrete, the scale at which that happens seems just too small to me.

replies(2): >>45767256 #>>45773173 #
5. anyfoo ◴[] No.45767032{3}[source]
It's a good question, but as elaborated in a sibling comment, I'm not sure it even matters in this case. (Sampling frequency vs. sampling the sound wave itself.)
6. a-dub ◴[] No.45767256{3}[source]
going all the way out to percept, the response of the system is non-linear: https://en.wikipedia.org/wiki/Mel_scale

this is believed to come from the shape of the cochlea, which is often modeled as a filterbank that can express this non-linearity in an intuitive way.

7. acjohnson55 ◴[] No.45768439[source]
Yes. See the volley theory of hearing: https://en.wikipedia.org/wiki/Volley_theory
8. kragen ◴[] No.45768720[source]
I think those bursts ("action potentials") happen at continuously varying times, though.
9. Balgair ◴[] No.45773173{3}[source]
Neuro person here.

Yes, many neurons fire at discrete intervals set by their morphology. In fact, this DFT/FFT/Infinite-FT/whatever-FT is all the hell over neuroscience. Many neurons don't really 'communicate' in just a single action potential. They are mostly firing at each other all the time, and the rate of firing is what communicates information. So neuron A is always popping at neuron B, but that tone/rate of popping is what affects change/information.

Now, this is not nearly true of every single neuron-neuron interaction. Some do use a single action potential (your patella knee reflex), some communicate with hundreds of other neurons (pyramidal cells in your cerebellum), some inhibit the firing of other neurons (gap/dendrite junction/axon interactions), some transmit information in opposite ways. It's a giant mess and the exact sub system is what you have to specify to get a handle on things.

Also, you get whole brain wave activity during different periods of sleep and awake cycles. So all the neurons will sync up their firing rates in certain areas when you're dreaming or taking an SAT of something. And yes, you can influence mass cyclic firing with powerful magnets (TCMS).

For the cochlea here, these hair cells are mostly firing all the time and then when a sound/frequency that they are 'tuned' to is heard, then their firing pattern changes and that information is then transmitted toward the parietal lobes. To be clear too, there are a lot of other brain structures in the way before the info gets to a place where you can be conscious of it. Things like the medial nuclei, the trapezoidal bodies, the caleyx of Held, etc. Most of these areas are for discriminating sounds and the location of sounds in space. So like when your fan is on for a long while and you no longer hear it, that's because of the other structures.