Ask HN: What Are You Working On? (June 2025)

I was hoping to make a piano practice assistant for my kids, that would take sheet music in MusicXML format, listen to the microphone stream, and check for things they frequently miss like rests, dynamics, consistent tempos.

Surprisingly the blocker has been identifying notes from the microphone input. I assumed that'd have been a long-solved problem; just do an FFT and find the peaks of the spectrogram? But apparently that doesn't work well when there's harmonics and reverb and such, and you have to use AI models (google and spotify have some) to do it. And so far it still seems to fail if there are more than three notes played simultaneously.

Now I'm baffled how song identification can work, if even identifying notes is so unreliable! Maybe I'm doing something wrong.

Here's an algorithm I cooked up for my (never completed) master's thesis:

It's based on the assumption that the most common frequency difference in all pairs of spectrum peaks is the base frequency of the sound.

-For the FFT use the Gaussian window because then your peaks look like Gaussians - the logarithm of a Gaussian is a parabola, so you only need three samples around the peak to calculate the exact frequency.

-Gather all the peaks along with their amplitudes. Pair all combinations.

-Create a histogram of frequency differences in those pairs, weighted by the product of the amplitudes of the peaks.

When you recognise a frequency you can attenuate it via comb filter and run the algorithm again to find another one.