Most active commenters
  • jmward01(4)
  • ErigmolCt(3)

←back to thread

1957 points apokryptein | 30 comments | | HN request time: 0.646s | source | bottom
1. theptip ◴[] No.42910331[source]
> Why do they need to know my screen brightness, memory amount, current volume and if I'm wearing headphones?

This is clearly adding entropy to de-anonymize users between apps, rather than to add specificity to ad bids.

replies(9): >>42910433 #>>42910476 #>>42910497 #>>42910702 #>>42914420 #>>42915971 #>>42916080 #>>42919652 #>>42937487 #
2. gruez ◴[] No.42910433[source]
Everything listed changes way too often to be useful for tracking. My guess is that it's for anti-fraud purposes. Someone setting up fake devices and/or device farms is likely to get similar values, which means they can be detected via ML or whatever.
replies(1): >>42910447 #
3. Groxx ◴[] No.42910447[source]
> screen brightness, memory amount, current volume and if I'm wearing headphones

None of those are likely to change when you navigate from one website to another, with tracking/ads disabled, which is what they want to be able to track. Otherwise they'd just use their cookies.

One device visits a site where you sell ads. A minute later, an unknown device with identical battery, volume, headphone, brightness, model number, browser version, and boot time to the second arrives on another site you run ads on. There's a pretty good chance they're related, because the odds of all those being the same plus those two sites and recent timings involved is rather low: https://coveryourtracks.eff.org/

Plus it doesn't have to be perfect. It just has to be good enough in bulk to sell.

4. jmward01 ◴[] No.42910476[source]
It would be amazing if you could build and send fake profiles of this information to create fake browser fingerprints and help track the trackers. Similarly, creating a lot of random noise here may help hide the true signal, or at least make their job a lot harder.
replies(1): >>42910540 #
5. Xen9 ◴[] No.42910497[source]
It's also useful for making ads more effective & manipulation overall. As long as you can connect the data you track & buy, you can use Thompson sampling. In fact, why would we think knowing the name of a person is anything but bad business?
6. nickburns ◴[] No.42910540[source]
Unfortunately fingerprinting prevention/resistance tactics become a readily identifiable signal unto themselves. I.e., the 'random noise' becomes fingerprintable if not widely utilized.

Everyone would need to be generating the same 'random noise' for any such tactics to be truly effective.

replies(2): >>42910643 #>>42913074 #
7. jmward01 ◴[] No.42910643{3}[source]
A sufficient number of people would need to, not everyone. And if I were the only one then tracking companies wouldn't adjust for just me. Basically, if this were to catch on then ad trackers wouldn't adjust until there was enough traffic for it to work. Also, that doesn't negate the ability to use this to create fake credentials that aids in tracking ads back to their source.
replies(1): >>42912335 #
8. GeoAtreides ◴[] No.42910702[source]
> adding entropy to de-anonymize users

_removing_ entropy, by adding more information bits

replies(1): >>42911996 #
9. ohisaysir ◴[] No.42911996[source]
Technically, information are the bits you DON'T know. Once you know the bits, it isn't "information" in the Shannon sense, in that it takes no energy to reset a message if you know all the bits, but takes N-units of energy for N unknown bits of information. (See; Feynman's lectures on computation)
10. sebastiennight ◴[] No.42912335{4}[source]
They don't need to adjust.

Here's a real-life example: You show up alone at the airport with a full-face mask and gray coveralls. You are perfectly hidden. But you are the only such hidden person, and there is still old cam footage of you in the airport parking lot, putting on the clothes. The surveillance team can let you act anonymous all you want. They still know who you are, because your disguise IS the unique fingerprint.

Now the scenario you're shooting for here is:

10 people are now walking around the airport in full-face masks and gray coveralls. You think, "well now they DO NOT know if it's ME, or some terrorist, or some random other guy from HN!"

But really, they still have this super-specific fingerprint (there are still less than 1 person in a million with this disguise) and all they need is ONE identifying characteristic (you're taller than the other masked people, maybe) to know who's who.

They didn't need to adjust their system one bit.

replies(3): >>42912878 #>>42913991 #>>42914311 #
11. jmward01 ◴[] No.42912878{5}[source]
Swapping fingerprint details is different than your example since it happens immediately and out of view. You could change fingerprints very often/create a new set for every browser tab. Additionally, as I pointed out before, they won't adjust until there is enough usage and when there is enough usage then the random settings are hard to distinguish because it isn't 1 in 1m. I get that they will keep trying to track down things that make browsing specific, but that is what updates are for. We need to at least make it hard.
12. gitgud ◴[] No.42913074{3}[source]
That's why it should be the browsers & OS's that enforce such privacy measures... it shouldn't be an option that my Grandma needs to enable...
replies(1): >>42913618 #
13. jmward01 ◴[] No.42913618{4}[source]
Unfortunately the fox is building the hen-house. They 'should' build products that improve my experience but they have very little incentive to do that when they get paid so much for the data they can extract. What would actually do it is regulations similar to financial regulations. OS/browser companies shouldn't be allowed to do business with data brokers. Then they would have one primary customer, the consumer, and competition would focus on the correct outcome. But 'regulation' is an evil word so we aren't likely to see anything like that actually happen.
14. theptip ◴[] No.42913991{5}[source]
I think this is a slightly different case no? If the ad network is using a very high precision variable to soft-link anonymized accounts, then randomizing the values between apps should break that.

Your analogy applies more to things like trying to anonymize your traffic with Tor, where using such an anonymizer flags your IP as doing something weird vs other users. I’m not convinced simply fuzzing the values would be detectable, assuming you pick values that other real users could pick.

replies(1): >>42918852 #
15. araes ◴[] No.42914311{5}[source]
It's kind of how people used to make fun of the CIA types and "undercover" operatives.

Look for the guy wearing a conspicuously plain leather jacket and baseball cap. "Why hello there average looking stranger I've never met. Psss, 'tis a fair day, but it'll be lovelier this evening.'" "Oh ... it's Murphy the spy you want."

Also, found out the CIA declassified a bunch of jokes several years back in searching to respond. [1] Most are already dead links on CIA.gov, yet there's a few remaining. Nother one on people commenting on the CIA. [2] "These types are swin- Ask in Langley if they work for the CIA. Every- Ask in Langley. They will tells one knows them." 'You, it's the big building behind.'

[1] https://nationalpost.com/news/the-cia-has-declassified-a-bun...

[2] https://www.cia.gov/readingroom/document/cia-rdp75-00149r000...

replies(1): >>42917168 #
16. sizzle ◴[] No.42914420[source]
Straight up fingerprinting us without consent it’s pure insanity.
replies(1): >>42915984 #
17. ErigmolCt ◴[] No.42915971[source]
Combine this with IP, timestamp, and some behavioral patterns, and you’ve got an extremely robust tracking mechanism that operates outside of explicit consent mechanisms.
18. ErigmolCt ◴[] No.42915984[source]
They’ve basically turned every phone into a tracking beacon
replies(1): >>42918187 #
19. claw-el ◴[] No.42916080[source]
I believe some apps actually have to automatically brighten up your screen when displaying a QR code for scanning, and then reduce back the brightness of its previous setting when moving out of the QR code. I believe the Whole Foods app does this for its first screen.
replies(1): >>42916140 #
20. emaro ◴[] No.42916140[source]
Surely that could be done without sending the brightness to some 3rd party.
21. AndrewOMartin ◴[] No.42917168{6}[source]
The garbage in the last sentence of this comment is due to the second link including incorrectly OCR'd text from an image of a newspaper using a two column layout. Both links are very amusing.
22. blueflow ◴[] No.42918187{3}[source]
I'm sure there is a choir of "told you so"-singers somewhere.
replies(2): >>42927794 #>>42945158 #
23. amanda99 ◴[] No.42918852{6}[source]
I'm sure the ad networks do a lot more than use high precision variables for soft-linking.

These are professional networks with a ton of capital thrown behind them. They have pretty decent algorithms, heuristics, etc; and you don't make money (compared to the other data correlation teams) if you do simple dumb stuff. I'm certain they take into account those trying to be privacy-conscious, if only to increase their match rates to be competitive.

24. xkzx ◴[] No.42919652[source]
Screen brightness can identify weather you are outside or inside.
replies(1): >>42919879 #
25. fumblebee ◴[] No.42919879[source]
Taken as one of a thousand attributes it's likely to provide at least some discriminatory lift in isolating a single user, even if tiny.
26. pizzafeelsright ◴[] No.42927794{4}[source]
Who thought lugging a transponder with GPS, facial recognition, microphone, and keylogger could lead to human tracking and privacy violations?

We all did and yet here we are.

27. shaftway ◴[] No.42937487[source]
I'm in this industry, and I have knowledge about this.

It's important to point out that it takes a long time for uptake of new versions of ad SDKs. The general assumption is that it takes about 6 months after release of a new version for 50% of ad traffic to come from that version or newer. Also, for every version you release, approximately 1% of traffic will never upgrade past that version.

In that kind of world, over-collecting data makes sense, especially if you think nobody will ever find out. Like total / and free disk space. There's no good reason to need those, right? But let's say an advertiser comes to you and says "we want to spend $1M / day to advertise our 10GB game, but only to devices that could install it." All of a sudden it's useful to know that a device only has 8GB of disk space, or only 100MB of free space.

So OK, if we didn't collect disk space, now it makes sense to collect disk space. Let's add it to the SDK. It takes a month or two to release a new version of the SDK. 3 months to get any meaningful traffic from it, and another 3 months to get up to 50% of your traffic. Assuming the ramps are linear, 4 months of 0%, and then 3 months of ramping to 50%, 30 days per month, you'll make $22.5M in the first 7 months. But if you had the logic in there to begin with, you'd have made $210M during the same time period. That makes it an easy choice for the business folks.

There are answers to this, but they all have drawbacks. You could limit data that ad agencies can collect. This reduces the value of ads. And agencies have learned that some data (like location) is low-value and high-risk, so they're removing the ability to supply it. I think it'd be better to support a model where ad code can be updated independently of the app. This way we could push out bug fixes faster, and could remove our just-in-case collection, but Apple has no signs that this is coming soon, and Google's answer has been such a shit-show that we aren't considering it viable over the next 4 years.

Edit: To address screen brightness specifically, it's a very rough proxy for age of the user.

replies(1): >>42938021 #
28. pavel_lishin ◴[] No.42938021[source]
> But let's say an advertiser comes to you and says "we want to spend $1M / day to advertise our 10GB game, but only to devices that could install it."

I don't want to call you a liar, but having seen ads that are presumably targeted at me, it feels like a total fiction to say that anyone is actually capable or interested in doing this.

I get advertisements for just absolute nonsense garbage that has no bearing on my life, and no bearing on anything that could have possibly been collected from my device.

The closest thing is that when I was in Mexico for a week, some of my podcast pre-roll ads were in Spanish. (Which, I should note, I do not speak fluently enough to even understand.) Even now, the occasional ad I'm served on a podcast is in Spanish.

And that's it. They saw that my IP came from Quintana Roo, and (somewhat reasonably) decided that I need to hear Spanish-language content. Even when I physically moved back to the United States.

replies(1): >>42940306 #
29. shaftway ◴[] No.42940306{3}[source]
The mobile ad industry is weird, and has some perverse incentives. Good games don't advertise (they don't need to). Games that hook the users just enough that they can show them more ads tend to plow that money right back into advertising to get more users. Those are the ads you see 99% of the time, and they're not really targeted. They're just people who know that the average 15 second interstitial will net them $0.006 in revenue, so they bid for it at $0.005.

Are there whales that spend $1m / day in advertising. Absolutely, 100%. Are they running at all times? No. We typically see that kind of spend from a single advertiser around 30 days out of the year. They're short campaigns, typically around a launch of a big title, and they always try to target as narrowly as they can to maximize their impact.

You're right about it using IP geo-location to guess where you are and what language you want. We also use that to determine if we should show you the GDPR disclosures. But try looking at ads on a Xiaomi phone versus a Samsung and you'll see a different set of ads, because one of those purchasers tends to have more disposable income.

30. ErigmolCt ◴[] No.42945158{4}[source]
Yeah, at this point, the surprise isn’t that it happened. It’s that people still pretend it’s not a big deal