> So the idea that it takes a huge amount of computing resources, battery life, permissions, or bandwidth to do matching of keywords is hilarious.
I also knew an entrepreneur who tried this same thing, but with TV shows.
Fingerprinting specific audio is a different algorithm problem entirely. You only need to sample a short section of audio every few minutes and then process the spectral peaks, which are fingerprinted against a database of known samples.
This is how apps that name a song work. It’s not the same as constant full speech to text.
But you’re skipping the key part of the story: They had to hand out phones specifically for this because you can’t get constant audio background processing from installing an app on a modern phone OS without the user noticing.
> That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day.
Again, wake word monitoring is a different algorithm. Monitoring for a wake word is a much simpler problem. They’re not processing everything you say, concerting it to text, and then doing a string compare for the wake word. It’s a very tiny learning model trained to match on a very specific phrase, which might run at a hardware level.