Stripe records user movements on its customers' websites

(mtlynch.io)

1134 points mtlynch | 1 comments | 21 Apr 20 17:01 UTC | HN request time: 0.202s | source

Show context

pc ◴[21 Apr 20 17:42 UTC] No.22937303[source]▶

Stripe cofounder here. The question raised ("Is Stripe collecting this data for advertising?") can be readily answered in the negative. This data has never been, would never be, and will never be sold/rented/etc. to advertisers.

Stripe.js collects this data only for fraud prevention -- it helps us detect bots who try to defraud businesses that use Stripe. (CAPTCHAs use similar techniques but result in more UI friction.) Stripe.js is part of the ML stack that helps us stop literally millions of fraudulent payments per day and techniques like this help us block fraud more effectively than almost anything else on the market. Businesses that use Stripe would lose a lot more money if it didn't exist. We see this directly: some businesses don't use Stripe.js and they are often suddenly and unpleasantly surprised when attacked by sophisticated fraud rings.

If you don't want to use Stripe.js, you definitely don't have to (or you can include it only on a minimal checkout page) -- it just depends how much PCI burden and fraud risk you'd like to take on.

We will immediately clarify the ToS language that makes this ambiguous. We'll also put up a clearer page about Stripe.js's fraud prevention.

(Updated to add: further down in this thread, fillskills writes[1]: "As someone who saw this first hand, Stripe’s fraud detection really works. Fraudulent transactions went down from ~2% to under 0.5% on hundreds of thousands of transactions per month. And it very likely saved our business at a very critical phase." This is what we're aiming for (and up against) with Stripe Radar and Stripe.js, and why we work on these technologies.)

[1] https://news.ycombinator.com/item?id=22938141

replies(52): >>22937327 #>>22937331 #>>22937352 #>>22937362 #>>22937385 #>>22937475 #>>22937518 #>>22937526 #>>22937559 #>>22937599 #>>22937775 #>>22937815 #>>22937962 #>>22938015 #>>22938068 #>>22938208 #>>22938310 #>>22938383 #>>22938533 #>>22938646 #>>22938728 #>>22938777 #>>22938855 #>>22938884 #>>22939026 #>>22939035 #>>22939376 #>>22939803 #>>22939814 #>>22939916 #>>22939952 #>>22940051 #>>22940090 #>>22940177 #>>22940282 #>>22940315 #>>22940317 #>>22940352 #>>22940686 #>>22940751 #>>22941252 #>>22942502 #>>22942538 #>>22942710 #>>22942907 #>>22943100 #>>22943453 #>>22944163 #>>22944509 #>>22944652 #>>22945170 #>>22946136 #

meowface ◴[21 Apr 20 19:11 UTC] No.22938310[source]▶

>>22937303 #

In my opinion, there's no moral issue with doing this. Fighting fraud and other kinds of cybercrime is an endless cat-and-mouse game. Although there are very bad associations with it, one simply does need to use fingerprinting and supercookies/"zombie cookies"/"evercookies" if they want even a fighting chance.

I think if it's being solely used for such security purposes, isn't shared with or sold to anyone else, and is carefully safeguarded, then it's okay. The main risk I see from it is mission creep leading to it eventually being used for other purposes, like advertising or tracking for "market research" reasons. I don't personally think it's likely Stripe would do this, though.

replies(5): >>22938691 #>>22938744 #>>22938940 #>>22940203 #>>22941791 #

mtlynch ◴[21 Apr 20 19:53 UTC] No.22938691[source]▶

>>22938310 #

> I think if it's being solely used for such security purposes, isn't shared with or sold to anyone else, and is carefully safeguarded, then it's okay. The main risk I see from it is mission creep leading to it eventually being used for other purposes, like advertising or tracking for "market research" reasons. I don't personally think it's likely Stripe would do this, though.

Is this view conditional on the type of data Stripe is currently collecting or would it apply to any data Stripe collects? Would this be true if Stripe began recording every keystroke in the app and hooked every XHR request to my backend server and sent copies to Stripe?

I agree that Stripe has a sensible reason for using this data. If I started seeing a high rate of chargebacks, I'd consider enabling Stripe on more parts of my site so that Stripe could consume user behavior earlier on to detect fraud.

My issue is that if there's no agreement about what data Stripe is allowed to collect and what their retention policies are, then the implicit agreement is that Stripe can just collect anything it has access to and hold it forever.

As JavaScript running on my page, Stripe.js has access to basically all user data in my app. There are certain types of user data I would not be comfortable sharing with Stripe, even if it improved fraud detection, so I'd like there to be clear limits on what they're gathering.

replies(1): >>22938750 #

meowface ◴[21 Apr 20 20:01 UTC] No.22938750[source]▶

>>22938691 #

Yes, I would say it's conditional. They should be more explicit about what data they're collecting from users. Opaque enough to not reveal all of the exact techniques, but clear enough so site owners can make an informed decision. (Someone dedicated and experienced enough could probably reverse engineer Stripe.js and figure out everything it's doing if they really wanted, but they're also probably updating it regularly.)

replies(1): >>22939398 #

adambyrtek ◴[21 Apr 20 21:14 UTC] No.22939398[source]▶

>>22938750 #

There's no need to rely on "security by obscurity". Stripe.js is just a thin client-side library everybody can analyse, so they might as well be fully transparent when it comes to data collection. I don't see the point of trying to obfuscate things, especially since the actual fraud-detection model works on the backend anyway.

replies(1): >>22939607 #

meowface ◴[21 Apr 20 21:39 UTC] No.22939607[source]▶

>>22939398 #

With these kinds of adversarial things, I think it's a mix of frontend and backend.

It's a library everyone can technically analyze, yes, but by 1) using ever-changing obfuscation that requires a lot of work to RE, and 2) constantly changing the client-side logic itself, it makes the work of the adversaries a lot harder and more tedious, and means either fewer of them will consistently succeed, or more of them will be forced to become more centralized around solutions/services that've successfully solved it, which means Stripe can focus-fire their efforts a bit more.

Of course there's also a lot going on on the backend that'll never be seen, but the adversary is trying to mimic a legitimate user as much as they can, so if the JavaScript is totally unobfuscated and stays the same for a while, it's a lot easier for them to consistently trace exactly what data is being sent and compare it against what their system or altered browser is sending.

It's cat-and-mouse across many dimensions. In such adversarial games, obscurity actually can and often does add some security. "Security by obscurity is no security at all" isn't exactly a fallacy, but it is a fallacy to apply it universally and with a very liberal definition of "security". It's generally meant for things that are more formal or provable, like an encryption or hashing algorithm or other cryptography. It's still totally reasonable to use obscurity as a minor practical measure. I'd agree with this part of https://en.wikipedia.org/wiki/Security_through_obscurity: "Knowledge of how the system is built differs from concealment and camouflage. The efficacy of obscurity in operations security depends by whether the obscurity lives on top of other good security practices, or if it is being used alone. When used as an independent layer, obscurity is considered a valid security tool."

For example, configuring your web server to not display its version on headers or pages is "security by obscurity", and certainly will not save you if you're running a vulnerable version, but may buy you some time if a 0-day comes out for your version and people search Shodan for the vulnerable version numbers - your site won't appear in the list. These kinds of obscurity measures of course never guarantee security and should be the very last line of defense in front of true security measures, but they can still potentially help you a little.

In the "malware vs. anti-virus" and "game cheat vs. game cheat detection software" fights that play out every day, both sides of each heavily obfuscate their code and the actions they perform. No, this never ensures it won't be fully reverse engineered. And the developers all know that. Given enough time and dedication, it'll eventually happen. But it requires more time and effort, and each time it's altered, it requires a re-investment of that time and effort.

Obfuscation and obscurity is arguably the defining feature and "value proposition" of each of those four types of software. A lot of that remains totally hidden on the backend (e.g. a botnet C2 web server only responding with malware binaries if they analyze the connection and believe it really is a regular infected computer and not a security researcher or sandbox), but a lot is also present in the client.

replies(2): >>22940100 #>>22941401 #

adambyrtek ◴[21 Apr 20 22:40 UTC] No.22940100[source]▶

>>22939607 #

Thanks for a thoughtful reply (upvoted), but have you looked at the library in question? The code is minified, but there is not much obfuscation going on: https://js.stripe.com/v3/

Most of your examples are quite low-level, but it's much harder to keep things hidden within the constraints of the browser sandbox when you have to interface with standard APIs which can be easily instrumented.

replies(1): >>22941765 #

1. meowface ◴[22 Apr 20 03:22 UTC] No.22941765[source]▶

>>22940100 #

Yeah, theirs is far less obfuscated than most fraud/bot detection libraries I've seen. I believe almost all of the JS code I've seen from companies that primarily do fraud detection and web security is pretty heavily obfuscated. Here, it looks like Stripe.js is doing much more than just the fraud stuff - this is their client library for everything, including payment handling.

I haven't analyzed it and can't say this with any certainty, but my guess is that you're probably right: they're focusing primarily on backend analysis and ML comparing activity across a massive array of customers. This is different from smaller security firms who have a lot less data due to fewer customers, and a kind of sampling bias of customers who are particularly worried about or inundated by fraud.

They may be less interested in suspicious activity or fingerprinting at the device level and more interested in it at the payment and personal information level (which is suggested by articles like https://stripe.com/radar/guide).

Pure, uninformed speculation, but it's possible that if they get deeper into anti-fraud in the future (perhaps if fraudsters get smarter about this higher layer of evasion), they might supplement the data science / finance / payment oriented stuff with more lower-level device and browser analysis, in which case I wouldn't be surprised if they eventually separate out some of the anti-fraud/security parts into an obfuscated portion. (Or, more likely, have Stripe.js load that portion dynamically. Maybe they're already doing this, even? Dunno.)

↑