←back to thread

How the cochlea computes (2024)

(www.dissonances.blog)
475 points izhak | 1 comments | | HN request time: 0s | source
Show context
kazinator ◴[] No.45762948[source]
> A Fourier transform has no explicit temporal precision, and resembles something closer to the waveforms on the right; this is not what the filters in the cochlea look like.

Perhaps the ear does someting more vaguely analogous to a discrete Fourier transforms on samples of data, which is what we do in a lot of signal processing.

In signal processing, we take windowed samples, and do discrete transforms on these. These do give us some temporal precision.

There is a trade off there between frequency and temporal precision, analgous to the Pauli exclusion principle in quantum mechanics. The better we know a frequency, the less precisely we know the timing. Only an infinite, periodic signal has a single precise frequency (or precise set of harmonics) which are infinitely narrow blips in the frequency domain.

The continuous Fourier transform deals with periodic signals only. We transform an entire function like sin(x) over the entire domain. If that domain is interpreted as time, we are including all of eternity, so to speak from negative infinite time to positive.

replies(4): >>45763111 #>>45763560 #>>45764192 #>>45766585 #
1. HarHarVeryFunny ◴[] No.45763560[source]
> There is a trade off there between frequency and temporal precision

Sure, and the FFT isn't inherently biased towards one vs the other. If you take an FFT over a long time window (narrowband spectrogram) then you get good frequency resolution at the cost of time resolution, and vice versa for a short time window (wideband spectrogram).

For speech recognition ideally you'd want to use both since they are detecting different things. TFA is saying that this is in fact what our cochlea filter bank is doing, using different types of filter at different frequency ranges - better frequency resolution at lower frequencies where the formants are (carrying articulatory information), and better time resolution at the high frequencies generated by fricatives where frequency doesn't matter but accurate onset detection is useful for detecting plosives.