Curl-impersonate: Special build of curl that can impersonate the major browsers

(github.com)

545 points mmh0000 | 1 comments | 03 Apr 25 15:24 UTC | HN request time: 0.211s | source

Show context

jchw ◴[03 Apr 25 16:40 UTC] No.43572243[source]▶

I'm rooting for Ladybird to gain traction in the future. Currently, it is using cURL proper for networking. That is probably going to have some challenges (I think cURL is still limited in some ways, e.g. I don't think it can do WebSockets over h2 yet) but on the other hand, having a rising browser engine might eventually remove this avenue for fingerprinting since legitimate traffic will have the same fingerprint as stock cURL.

replies(6): >>43572413 #>>43573011 #>>43574225 #>>43576912 #>>43580376 #>>43583469 #

nonrandomstring ◴[03 Apr 25 19:25 UTC] No.43574225[source]▶

>>43572243 #

When I spoke to these guys [0] we touched on those quirks and foibles that make a signature (including TCP stack stuff beyond control of any userspace app).

I love this curl, but I worry that if a component takes on the role of deception in order to "keep up" it accumulates a legacy of hard to maintain "compatibility" baggage.

Ideally it should just say... "hey I'm curl, let me in"

The problem of course lies with a server that is picky about dress codes, and that problem in turn is caused by crooks sneaking in disguise, so it's rather a circular chicken and egg thing.

[0] https://cybershow.uk/episodes.php?id=39

replies(2): >>43574560 #>>43575789 #

immibis ◴[03 Apr 25 19:53 UTC] No.43574560[source]▶

>>43574225 #

What should instead happen is that Chrome should stop sending as much of a fingerprint, so that sites won't be able to fingerprint. That won't happen, since it's against Google's interests.

replies(1): >>43574900 #

gruez ◴[03 Apr 25 20:22 UTC] No.43574900[source]▶

>>43574560 #

This is a fundamental misunderstanding of how TLS fingerprinting works. The "fingerprint" isn't from chrome sending a "fingerprint: [random uuid]" attribute in every TLS negotiation. It's derived from various properties of the TLS stack, like what ciphers it can accept. You can't make "stop sending as much of a fingerprint", without every browser agreeing on the same TLS stack. It's already minimal as it is, because there's basically no aspect of the TLS stack that users can configure, and chrome bundles its own, so you'd expect every chrome user to have the same TLS fingerprint. It's only really useful to distinguish "fake" chrome users (eg. curl with custom header set, or firefox users with user agent spoofer) from "real" chrome users.

replies(2): >>43574983 #>>43584170 #

dochtman ◴[03 Apr 25 20:30 UTC] No.43574983[source]▶

>>43574900 #

Part of the fingerprint is stuff like the ordering of extensions, which Chrome could easily do but AFAIK doesn’t.

(AIUI Google’s Play Store is one of the biggest TLS fingerprinting culprits.)

replies(2): >>43575010 #>>43575074 #

shiomiru ◴[03 Apr 25 20:37 UTC] No.43575074[source]▶

>>43574983 #

Chrome has randomized its ClientHello extension order for two years now.[0]

The companies to blame here are solely the ones employing these fingerprinting techniques, and those relying on services of these companies (which is a worryingly large chunk of the web). For example, after the Chrome change, Cloudflare just switched to a fingerprinter that doesn't check the order.[1]

[0]: https://chromestatus.com/feature/5124606246518784

[1]: https://blog.cloudflare.com/ja4-signals/

replies(2): >>43575406 #>>43576104 #

fc417fc802 ◴[03 Apr 25 22:12 UTC] No.43576104[source]▶

>>43575074 #

> The companies to blame here are solely the ones employing these fingerprinting techniques,

Let's not go blaming vulnerabilities on those exploiting them. Exploitation is also bad but being exploitable is a problem in and of itself.

replies(2): >>43579898 #>>43587654 #

Jubijub ◴[04 Apr 25 21:14 UTC] No.43587654[source]▶

>>43576104 #

I’m sorry but you comment shows you never had to fight this problem a scale. The challenge is not small time crawlers, the challenge is blocking large / dedicated actors. The problem is simple : if there is more than X volume of traffic per <aggregation criteria >, block it. Problem : most aggregation criteria are trivially spoofable, or very cheap to change : - IP : with IPv6 this is not an issue to rotate your IP often - UA : changing this is scraping 101 - SSL fingerprint : easy to use the same as everyone - IP stack fingerprint : also easy to use a common one - request / session tokens : it’s cheap to create a new session You can force login, but then you have a spam account creation challenge, with the same issues as above, and depending on your infra this can become heavy

Add to this that the minute you use a signal for detection, you “burn” it as adversaries will avoid using it, and you lose measurement thus the ability to know if you are fixing the problem at all.

I worked on this kind of problem for a FAANG service, whoever claims it’s easy clearly never had to deal with motivated adversaries

replies(1): >>43595929 #

1. immibis ◴[05 Apr 25 19:01 UTC] No.43595929[source]▶

>>43587654 #

Should be easy enough to create a DroneBL for residential proxy services. Since you work on residential proxy detection at a FAANG service, why haven't you done it yet?

If they're doing things the above-board way from their own ASN, block their ASN.

If they're doing things the above-board way from third-party hosting providers, send abuse reports. Late last year there was a commotion because someone was sending single spoofed SSH SYN packets, from the addresses of Tor nodes, to organizations with extremely sensitive security policies. Many people with Tor nodes got threats of being banned from their hosting provider, over a single packet they didn't even send. They're definitely going to ban people who are doing actual DDoSes from their servers.

DDoS is also a federal crime, so if you and they are in the USA, you might consider trying to get them put in prison.

↑