This is clearly adding entropy to de-anonymize users between apps, rather than to add specificity to ad bids.
This is clearly adding entropy to de-anonymize users between apps, rather than to add specificity to ad bids.
None of those are likely to change when you navigate from one website to another, with tracking/ads disabled, which is what they want to be able to track. Otherwise they'd just use their cookies.
One device visits a site where you sell ads. A minute later, an unknown device with identical battery, volume, headphone, brightness, model number, browser version, and boot time to the second arrives on another site you run ads on. There's a pretty good chance they're related, because the odds of all those being the same plus those two sites and recent timings involved is rather low: https://coveryourtracks.eff.org/
Plus it doesn't have to be perfect. It just has to be good enough in bulk to sell.
Everyone would need to be generating the same 'random noise' for any such tactics to be truly effective.
_removing_ entropy, by adding more information bits
Here's a real-life example: You show up alone at the airport with a full-face mask and gray coveralls. You are perfectly hidden. But you are the only such hidden person, and there is still old cam footage of you in the airport parking lot, putting on the clothes. The surveillance team can let you act anonymous all you want. They still know who you are, because your disguise IS the unique fingerprint.
Now the scenario you're shooting for here is:
10 people are now walking around the airport in full-face masks and gray coveralls. You think, "well now they DO NOT know if it's ME, or some terrorist, or some random other guy from HN!"
But really, they still have this super-specific fingerprint (there are still less than 1 person in a million with this disguise) and all they need is ONE identifying characteristic (you're taller than the other masked people, maybe) to know who's who.
They didn't need to adjust their system one bit.
Your analogy applies more to things like trying to anonymize your traffic with Tor, where using such an anonymizer flags your IP as doing something weird vs other users. I’m not convinced simply fuzzing the values would be detectable, assuming you pick values that other real users could pick.
Look for the guy wearing a conspicuously plain leather jacket and baseball cap. "Why hello there average looking stranger I've never met. Psss, 'tis a fair day, but it'll be lovelier this evening.'" "Oh ... it's Murphy the spy you want."
Also, found out the CIA declassified a bunch of jokes several years back in searching to respond. [1] Most are already dead links on CIA.gov, yet there's a few remaining. Nother one on people commenting on the CIA. [2] "These types are swin- Ask in Langley if they work for the CIA. Every- Ask in Langley. They will tells one knows them." 'You, it's the big building behind.'
[1] https://nationalpost.com/news/the-cia-has-declassified-a-bun...
[2] https://www.cia.gov/readingroom/document/cia-rdp75-00149r000...
These are professional networks with a ton of capital thrown behind them. They have pretty decent algorithms, heuristics, etc; and you don't make money (compared to the other data correlation teams) if you do simple dumb stuff. I'm certain they take into account those trying to be privacy-conscious, if only to increase their match rates to be competitive.
We all did and yet here we are.
It's important to point out that it takes a long time for uptake of new versions of ad SDKs. The general assumption is that it takes about 6 months after release of a new version for 50% of ad traffic to come from that version or newer. Also, for every version you release, approximately 1% of traffic will never upgrade past that version.
In that kind of world, over-collecting data makes sense, especially if you think nobody will ever find out. Like total / and free disk space. There's no good reason to need those, right? But let's say an advertiser comes to you and says "we want to spend $1M / day to advertise our 10GB game, but only to devices that could install it." All of a sudden it's useful to know that a device only has 8GB of disk space, or only 100MB of free space.
So OK, if we didn't collect disk space, now it makes sense to collect disk space. Let's add it to the SDK. It takes a month or two to release a new version of the SDK. 3 months to get any meaningful traffic from it, and another 3 months to get up to 50% of your traffic. Assuming the ramps are linear, 4 months of 0%, and then 3 months of ramping to 50%, 30 days per month, you'll make $22.5M in the first 7 months. But if you had the logic in there to begin with, you'd have made $210M during the same time period. That makes it an easy choice for the business folks.
There are answers to this, but they all have drawbacks. You could limit data that ad agencies can collect. This reduces the value of ads. And agencies have learned that some data (like location) is low-value and high-risk, so they're removing the ability to supply it. I think it'd be better to support a model where ad code can be updated independently of the app. This way we could push out bug fixes faster, and could remove our just-in-case collection, but Apple has no signs that this is coming soon, and Google's answer has been such a shit-show that we aren't considering it viable over the next 4 years.
Edit: To address screen brightness specifically, it's a very rough proxy for age of the user.
I don't want to call you a liar, but having seen ads that are presumably targeted at me, it feels like a total fiction to say that anyone is actually capable or interested in doing this.
I get advertisements for just absolute nonsense garbage that has no bearing on my life, and no bearing on anything that could have possibly been collected from my device.
The closest thing is that when I was in Mexico for a week, some of my podcast pre-roll ads were in Spanish. (Which, I should note, I do not speak fluently enough to even understand.) Even now, the occasional ad I'm served on a podcast is in Spanish.
And that's it. They saw that my IP came from Quintana Roo, and (somewhat reasonably) decided that I need to hear Spanish-language content. Even when I physically moved back to the United States.
Are there whales that spend $1m / day in advertising. Absolutely, 100%. Are they running at all times? No. We typically see that kind of spend from a single advertiser around 30 days out of the year. They're short campaigns, typically around a launch of a big title, and they always try to target as narrowly as they can to maximize their impact.
You're right about it using IP geo-location to guess where you are and what language you want. We also use that to determine if we should show you the GDPR disclosures. But try looking at ads on a Xiaomi phone versus a Samsung and you'll see a different set of ads, because one of those purchasers tends to have more disposable income.