Most active commenters

kurthr(4)

Your phone isn't secretly listening to you, but the truth is more disturbing

(newatlas.com)

Show context

limbero ◴[26 Apr 25 17:01 UTC] No.43805260[source]▶

This article reminds me of this excellent tongue-in-cheek piece of writing by Jonathan Zeller in McSweeney's:

Calm Down—Your Phone Isn’t Listening to Your Conversations. It’s Just Tracking Everything You Type, Every App You Use, Every Website You Visit, and Everywhere You Go in the Physical World

https://www.mcsweeneys.net/articles/calm-down-your-phone-isn...

replies(3): >>43806692 #>>43808289 #>>43808448 #

Spooky23 ◴[26 Apr 25 20:09 UTC] No.43806692[source]▶

>>43805260 #

There is so much time spent “debunking” audio recordings being shared with various entities it makes me more suspicious.

Just like Facebook’s “we never sell your data (we just stalk you and sell ads using your data)”. I’m sure there’s a similar weasel excuse… “we never listen to your audio (but we do analyze it to improve quality assurance)”

replies(7): >>43807661 #>>43807727 #>>43808347 #>>43808662 #>>43809138 #>>43809519 #>>43824424 #

1. kurthr ◴[26 Apr 25 22:20 UTC] No.43807727[source]▶

>>43806692 #

I can just say that I knew an entrepreneur in early post Y2K who developed apps to track music played in clubs in SF for folks like ASCAP, BMI, and SESAC. They gave out "free" phones (these were the small expensive candybars and nice flip/slideups) to the influencers of the day. They compressed the audio for orthogonality, and had a huge number of hashes to match. If they got more than a few consecutive matching hashes at a location that wasn't paying royalties, they got an enforcement call.

So the idea that it takes a huge amount of computing resources, battery life, permissions, or bandwidth to do matching of keywords is hilarious. That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day. Just add another hundred and report them once an hour. You don't need low latency. It's just another tool in the bag!

Of course the cat food example is bad, because if they weren't looking for that you wouldn't get a response. Who would be willing to pay big for clicks on cat food. Now bariatric surgery? DUI? HELOC? Those pay.

replies(3): >>43807838 #>>43810185 #>>43811924 #

2. LeafItAlone ◴[26 Apr 25 22:37 UTC] No.43807838[source]▶

>>43807727 (TP) #

>That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day.

You might have just convinced me that the “phone is listening” is total bunk, because these dedicated devices are just so bad at recognizing the very specific, short, phrases when explicitly directed at them that I can’t imagine they are listening for much more. Listening to my in-laws try to activate their Alexa and Google Homes is something the CIA might consider for their next torture method.

replies(1): >>43808406 #

3. kurthr ◴[27 Apr 25 00:23 UTC] No.43808406[source]▶

>>43807838 #

You expect 95% accuracy matching activation phrases. You don't need that for ads. It only needs to work some of the time for some of the people, especially if it makes $/click.

replies(1): >>43826722 #

4. gf000 ◴[27 Apr 25 07:44 UTC] No.43810185[source]▶

>>43807727 (TP) #

What kind of keywords would you imagine provide an actual, profitable advantage to an ad company? I can't imagine "computer 2", "fridge 3", "egg 4" being all that valuable compared to.. literally my whole browser history and my reaction to other ads/videos (I looked at that short for 10s vs immediately skipping builds a very nice profile). And now add i18n in the picture - even the main AI assistant products suck in anything other than English, so this fancy, advanced technology with low return of value would end up with a low target audience as well.

Also, "Siri" and the like ends up waking the main processor, which is definitely easy to prove/disprove. Just talk to your phone continuously for a long time and see if it wakes.

replies(1): >>43810216 #

5. thinkingemote ◴[27 Apr 25 07:50 UTC] No.43810216[source]▶

>>43810185 #

Low, even very low, return of value is not no return. Therefore, given they make some return, and it has some value, that's enough for them to do it. Ads and ad data are two sides. We are often not the target for an ad, but our data provides stats about how an ad is performing. If more consumers are influenced to spend $1000 on something than not, then it's worth if for them. It's an aggregate cost benefit analysis not how effective it is at the isolated individual level.

Another thing to consider is that we should never fall into the trap of thinking we are immune from influence from advertisers. Firstly, it's basically what advertiser want; it allows more actions like this, more of our data to be sold and secondly because it's easier to influence someone if they think of a decision as their own choice, than if they think they were manipulated into it. We do not remember the ads we see but we can remember that we are all susceptible to influence.

replies(1): >>43811690 #

6. gf000 ◴[27 Apr 25 13:13 UTC] No.43811690{3}[source]▶

>>43810216 #

Return of value is with respect to the costs of it. A lawsuit/brand value loss from illegally recording every communication you make (which we would have definite proof if it were happening, given that there our more phones than people on Earth) would far outweigh the tiny benefit (if any? I'm not convinced you would get any extra information in the general case compared to the tracking of the regular usage of your phone)

Also, I don't see the relevance of your second paragraph. The baseline is not "no ads", the baseline is "ads supported by all the tracking that Meta/Google currently does".

7. Aurornis ◴[27 Apr 25 13:50 UTC] No.43811924[source]▶

>>43807727 (TP) #

> So the idea that it takes a huge amount of computing resources, battery life, permissions, or bandwidth to do matching of keywords is hilarious.

I also knew an entrepreneur who tried this same thing, but with TV shows.

Fingerprinting specific audio is a different algorithm problem entirely. You only need to sample a short section of audio every few minutes and then process the spectral peaks, which are fingerprinted against a database of known samples.

This is how apps that name a song work. It’s not the same as constant full speech to text.

But you’re skipping the key part of the story: They had to hand out phones specifically for this because you can’t get constant audio background processing from installing an app on a modern phone OS without the user noticing.

> That's what "siri", "hey google", "alexa" etc are all doing 24 hours a day.

Again, wake word monitoring is a different algorithm. Monitoring for a wake word is a much simpler problem. They’re not processing everything you say, concerting it to text, and then doing a string compare for the wake word. It’s a very tiny learning model trained to match on a very specific phrase, which might run at a hardware level.

replies(1): >>43813471 #

8. kurthr ◴[27 Apr 25 17:15 UTC] No.43813471[source]▶

>>43811924 #

I agree it's a different algorithm, but not a higher powered one. You don't need to know context to get HELOC, Bariatric, or DUI. You also don't need 95%+ accuracy for 95% of the population. You're just doing advertising.

replies(1): >>43815252 #

9. Aurornis ◴[27 Apr 25 21:22 UTC] No.43815252{3}[source]▶

>>43813471 #

Doing 100 different matches updated frequently is an entirely different problem than matching a single wake word that isn’t changing.

Regardless, this would require so much coordination, network traffic, and on-device code that could be reverse engineered that you’re implying that nobody has every found a hint of it existing and no employees of these companies have ever leaked any hints of it existing.

It’s very much in the domain of conspiracy theories.

replies(1): >>43822760 #

10. kurthr ◴[28 Apr 25 15:45 UTC] No.43822760{4}[source]▶

>>43815252 #

Well, actually when you're hash based doing 100 different matches is the easy part. I'm not sure you know how steep FAR/FRR curves are for >99%/95% singe word accuracy, but having seen wake word development it's easily 100x harder than 95%/90% accuracy and none of the heavy calculation other than voice compression needs to be done locally or in a short time period. The network traffic is literally a few hundred hashes downloaded and hundreds of bits of hash matches a day (~1kB).

Even in the article there are multiple reports of it that are dismissed, and even though reverse engineering larger apps on iPhone/Android is certainly possible, with obfuscation searching for yet another hash table matching or simple voice compression is also quite difficult. Where are all the other articles reporting on the reverse engineering the very screencap apps this article talked about? Are they also just more well documented conspiracy theories?

Frankly, your best argument is that nobody is selling this as a product. So maybe there are easier more effective methods, but not because it can't or hasn't been done (since it literally has and it's been reported). It's kinda the opposite of a conspiracy theory. You have to assume that everyone capable with a vested interest won't do it, or that all of them will be caught, or that making money with ads becomes unpopular.

11. LeafItAlone ◴[28 Apr 25 22:17 UTC] No.43826722{3}[source]▶

>>43808406 #

>You expect 95% accuracy matching activation phrases.

At this point I don’t even expect 50% (trying twice), and I’m still disappointed.

>It only needs to work some of the time for some of the people, especially if it makes $/click.

So where can one find this market? We know the price of traditional ad clicks. Surely we’d see a market for “voice-driven” ads with higher rates?

↑