←back to thread

369 points zeech | 2 comments | | HN request time: 0.043s | source
Show context
limbero ◴[] No.43805260[source]
This article reminds me of this excellent tongue-in-cheek piece of writing by Jonathan Zeller in McSweeney's:

Calm Down—Your Phone Isn’t Listening to Your Conversations. It’s Just Tracking Everything You Type, Every App You Use, Every Website You Visit, and Everywhere You Go in the Physical World

https://www.mcsweeneys.net/articles/calm-down-your-phone-isn...

replies(3): >>43806692 #>>43808289 #>>43808448 #
Spooky23 ◴[] No.43806692[source]
There is so much time spent “debunking” audio recordings being shared with various entities it makes me more suspicious.

Just like Facebook’s “we never sell your data (we just stalk you and sell ads using your data)”. I’m sure there’s a similar weasel excuse… “we never listen to your audio (but we do analyze it to improve quality assurance)”

replies(7): >>43807661 #>>43807727 #>>43808347 #>>43808662 #>>43809138 #>>43809519 #>>43824424 #
kurthr ◴[] No.43807727[source]
I can just say that I knew an entrepreneur in early post Y2K who developed apps to track music played in clubs in SF for folks like ASCAP, BMI, and SESAC. They gave out "free" phones (these were the small expensive candybars and nice flip/slideups) to the influencers of the day. They compressed the audio for orthogonality, and had a huge number of hashes to match. If they got more than a few consecutive matching hashes at a location that wasn't paying royalties, they got an enforcement call.

So the idea that it takes a huge amount of computing resources, battery life, permissions, or bandwidth to do matching of keywords is hilarious. That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day. Just add another hundred and report them once an hour. You don't need low latency. It's just another tool in the bag!

Of course the cat food example is bad, because if they weren't looking for that you wouldn't get a response. Who would be willing to pay big for clicks on cat food. Now bariatric surgery? DUI? HELOC? Those pay.

replies(3): >>43807838 #>>43810185 #>>43811924 #
Aurornis ◴[] No.43811924[source]
> So the idea that it takes a huge amount of computing resources, battery life, permissions, or bandwidth to do matching of keywords is hilarious.

I also knew an entrepreneur who tried this same thing, but with TV shows.

Fingerprinting specific audio is a different algorithm problem entirely. You only need to sample a short section of audio every few minutes and then process the spectral peaks, which are fingerprinted against a database of known samples.

This is how apps that name a song work. It’s not the same as constant full speech to text.

But you’re skipping the key part of the story: They had to hand out phones specifically for this because you can’t get constant audio background processing from installing an app on a modern phone OS without the user noticing.

> That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day.

Again, wake word monitoring is a different algorithm. Monitoring for a wake word is a much simpler problem. They’re not processing everything you say, concerting it to text, and then doing a string compare for the wake word. It’s a very tiny learning model trained to match on a very specific phrase, which might run at a hardware level.

replies(1): >>43813471 #
kurthr ◴[] No.43813471{3}[source]
I agree it's a different algorithm, but not a higher powered one. You don't need to know context to get HELOC, Bariatric, or DUI. You also don't need 95%+ accuracy for 95% of the population. You're just doing advertising.
replies(1): >>43815252 #
1. Aurornis ◴[] No.43815252{4}[source]
Doing 100 different matches updated frequently is an entirely different problem than matching a single wake word that isn’t changing.

Regardless, this would require so much coordination, network traffic, and on-device code that could be reverse engineered that you’re implying that nobody has every found a hint of it existing and no employees of these companies have ever leaked any hints of it existing.

It’s very much in the domain of conspiracy theories.

replies(1): >>43822760 #
2. kurthr ◴[] No.43822760[source]
Well, actually when you're hash based doing 100 different matches is the easy part. I'm not sure you know how steep FAR/FRR curves are for >99%/95% singe word accuracy, but having seen wake word development it's easily 100x harder than 95%/90% accuracy and none of the heavy calculation other than voice compression needs to be done locally or in a short time period. The network traffic is literally a few hundred hashes downloaded and hundreds of bits of hash matches a day (~1kB).

Even in the article there are multiple reports of it that are dismissed, and even though reverse engineering larger apps on iPhone/Android is certainly possible, with obfuscation searching for yet another hash table matching or simple voice compression is also quite difficult. Where are all the other articles reporting on the reverse engineering the very screencap apps this article talked about? Are they also just more well documented conspiracy theories?

Frankly, your best argument is that nobody is selling this as a product. So maybe there are easier more effective methods, but not because it can't or hasn't been done (since it literally has and it's been reported). It's kinda the opposite of a conspiracy theory. You have to assume that everyone capable with a vested interest won't do it, or that all of them will be caught, or that making money with ads becomes unpopular.