Most active commenters

tptacek(12)
LegionMammal978(3)
immibis(3)

Popular/hot comments

>>41831506 #
>>41832201 #

←back to thread

CRLF is obsolete and should be abolished

(fossil-scm.org)

Show context

michaelmior ◴[13 Oct 24 20:03 UTC] No.41831072[source]▶

>>41830717 (OP) #

> various protocols (HTTP, SMTP, CSV) still "require" CRLF at the end of each line

What would be the benefit to updating legacy protocols to just use NL? You save a handful of bits at the expense of a lot of potential bugs. HTTP/1(.1) is mostly replaced by HTTP/2 and later by now anyway.

Sure, it makes sense not to require CRLF with any new protocols, but it doesn't seem worth updating legacy things.

> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply.

I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

replies(13): >>41831206 #>>41831210 #>>41831225 #>>41831256 #>>41831322 #>>41831364 #>>41831391 #>>41831706 #>>41832337 #>>41832719 #>>41832751 #>>41834474 #>>41835444 #

FiloSottile ◴[13 Oct 24 20:39 UTC] No.41831391[source]▶

>>41831072 #

Exactly. Please DO NOT mess with protocols, especially legacy critical protocols based on in-band signaling.

HTTP/1.1 was regrettably but irreversibly designed with security-critical parser alignment requirements. If two implementations disagree on whether `A:B\nC:D` contains a value for C, you can build a request smuggling gadget, leading to significant attacks. We live in a post-Postel world, only ever generate and accept CRLF in protocols that specify it, however legacy and nonsensical it might be.

(I am a massive, massive SQLite fan, but this is giving me pause about using other software by the same author, at least when networks are involved.)

replies(7): >>41831450 #>>41831498 #>>41831871 #>>41832546 #>>41832632 #>>41832661 #>>41839309 #

1. tptacek ◴[13 Oct 24 20:47 UTC] No.41831450[source]▶

>>41831391 #

This would be more persuasive if HTTP servers didn't already widely accept bare 0ah line termination. What's the first major public web site you can find that doesn't?

replies(5): >>41831506 #>>41831717 #>>41832137 #>>41832555 #>>41832731 #

2. michaelmior ◴[13 Oct 24 20:55 UTC] No.41831506[source]▶

>>41831450 (TP) #

We're talking about servers and clients here. The best way to ensure things work is to adhere to an established protocol. Aside from saving a few bytes, there doesn't seem to be any good reason to deviate.

replies(3): >>41831609 #>>41831637 #>>41832929 #

3. tptacek ◴[13 Oct 24 21:08 UTC] No.41831609[source]▶

>>41831506 #

I'm saying the consistency that Filippo says our security depends on doesn't really seem to exist in the world, which hurts the persuasiveness of that particular argument in favor of consistency.

replies(2): >>41831837 #>>41835413 #

4. Ekaros ◴[13 Oct 24 21:11 UTC] No.41831637[source]▶

>>41831506 #

There is very good reasons not to deviate as mismatch in various other things that can or are not on the path can affect things. Like reverse proxies, load balancers and so on.

5. FiloSottile ◴[13 Oct 24 21:20 UTC] No.41831717[source]▶

>>41831450 (TP) #

Hrm, this is what I get for logging in to HN from my phone. It’s possible I am confusing this with one of the other exploitable HTTP/1.1 header parser alignment issues.

Maybe this was so widespread that ~everything already handles it because non-malicious stuff breaks if you don’t. In that case, my bad, but I still would like to make a general plea as an implementer for sticking strictly to specified behavior in this sort of protocols.

6. dwattttt ◴[13 Oct 24 21:35 UTC] No.41831837{3}[source]▶

>>41831609 #

But no one expects 0ah to be sufficient. Change that expectation, and now you have to wonder if your middleware and your backend agree on whether the middleware filtered out internal-only headers.

replies(1): >>41831921 #

7. tptacek ◴[13 Oct 24 21:45 UTC] No.41831921{4}[source]▶

>>41831837 #

Yeah, I'm not certain that this is a real issue. It might be? Certainly, I'm read in to things like TECL desync. I get the concern, that any disagreement in parsing policies is problematic for HTTP because of middleboxes. But I think the ship may have sailed on 0ah, and that it may be the case that you simply have to build HTTP systems to be bare-0ah-tolerant if you want your system to be resilient.

replies(1): >>41832774 #

8. hifromwork ◴[13 Oct 24 22:09 UTC] No.41832137[source]▶

>>41831450 (TP) #

As the parent mentioned, it's security critical that every HTTP parser in the world - including every middleware, proxy, firewall, WAF - parses the headers in the same way. If you write a HTTP parser for a server application it's imperative you don't introduce random inconsistences with the standard (I can't believe I have to write this).

On the other hand, as a client, it's OK to send malformed requests, as long as you're prepared that they may fail. But it's a weird flex, legacy protocols have many warts, why die on this particular hill.

replies(2): >>41832201 #>>41835964 #

9. tptacek ◴[13 Oct 24 22:20 UTC] No.41832201[source]▶

>>41832137 #

That appears to be an argument in favor of accepting bare-0ah, since as a positive statement that is the situation on the Internet today.

replies(3): >>41832905 #>>41833940 #>>41834573 #

10. LegionMammal978 ◴[13 Oct 24 23:08 UTC] No.41832555[source]▶

>>41831450 (TP) #

Going down a list of top websites, these URLs respond with HTTP 200 (possibly after redirections) when sent an ordinary HTTP/1.1 GET request with 0D0A line endings, but respond with HTTP 400 when sent the exact same request with 0A line endings:

  https://br.pinterest.com/ https://www.pinterest.co.uk/
  https://apps.apple.com/ https://support.apple.com/ https://podcasts.apple.com/ https://music.apple.com/ https://geo.itunes.apple.com/
  https://ncbi.nlm.nih.gov/ https://www.salesforce.com/ https://www.purdue.edu/ https://www.playstation.com/
  https://llvm.org/ https://www.iana.org/ https://www.gnu.org/ https://epa.gov/ https://justice.gov/
  https://www.brendangregg.com/ http://heise.de/ https://www.post.ch/ http://hhs.gov/ https://oreilly.com/
  https://www.thinkgeek.com/ https://www.constantcontact.com/ https://sciencemag.org/ https://nps.gov/
  https://www.cs.mun.ca/ https://www.wipo.int/ https://www.unicode.org/ https://economictimes.indiatimes.com/
  https://science.org/ https://icann.org/ https://caniuse.com/ https://w3techs.com/ https://chrisharrison.net/
  https://www.universal-music.co.jp/ https://digiland.libero.it/ https://webaim.org/ https://webmd.com/

This URL responds with HTTP 505 on an 0A request:

  https://ed.ted.com/

These URLs don't respond on an 0A request:

  https://quora.com/
  https://www.nist.gov/

Most of these seem pretty major to me. There are other sites that are public but responded with an HTTP 403, probably because they didn't like the VPN or HTTP client I used for this test. (Also, www.apple.com is tolerant of 0A line endings, even though its other subdomains aren't, which is weird.)

replies(1): >>41832634 #

11. tptacek ◴[13 Oct 24 23:22 UTC] No.41832634[source]▶

>>41832555 #

You sure about this? www.pinterest.com, for instance, does not appear to care whether I 0d0a or just 0a.

replies(1): >>41832865 #

12. rtpg ◴[13 Oct 24 23:41 UTC] No.41832731[source]▶

>>41831450 (TP) #

Gunicorn expects `\r\n` for lines (see gunicorn/http/message.py:read_line), though it's possible that every middleware that is in front of gunicorn in practice normalizes lines to avoid this issue.

replies(1): >>41832977 #

13. dwattttt ◴[13 Oct 24 23:50 UTC] No.41832774{5}[source]▶

>>41831921 #

But what's bare-0ah-tolerant? Accepting _or_ ignoring bare 0ah's means you need to ensure all your moving parts agree, or you end up in the "one bit thinks this is two headers, others think it's one header".

The only situation where you don't need to know two policies match is when one of the policies rejects one of the combinations outright. Probably. Maybe.

EDIT: maybe it's better phrased as "all parts need to be bare-0ah-strict". But then it's fine if it's bare-0ah-reject; they just need to all be strict, one way or the other.

14. LegionMammal978 ◴[14 Oct 24 00:03 UTC] No.41832865{3}[source]▶

>>41832634 #

My apologies, I was using a client which kept the connection alive between the 0D0A and 0A requests, which has an effect on www.pinterest.com. Rerunning the test with separate connections for 0D0A and 0A requests, www.pinterest.com and phys.org are no longer affected (I've removed the two from the list), but all other URLs are still affected.

replies(1): >>41832909 #

15. theamk ◴[14 Oct 24 00:12 UTC] No.41832905{3}[source]▶

>>41832201 #

Wouldn't the safest thing, security-wise, to fail fast on bare 0ah?

As a web server, you may not know which intermediate proxies did the request traverse before arriving to your port. Given that request smuggling is a thing, failing fast with no further parsing on any protocol deviations seems to be the most secure thing.

replies(1): >>41832974 #

16. tptacek ◴[14 Oct 24 00:12 UTC] No.41832909{4}[source]▶

>>41832865 #

I picked one at random --- hhs.gov --- and it too appears to work?

For what it's worth: I'm testing by piping the bytes for a bare-newline HTTP request directly into netcat.

replies(1): >>41833406 #

17. Aeolun ◴[14 Oct 24 00:16 UTC] No.41832929[source]▶

>>41831506 #

Well, you can achieve the desired behavior in all situations by ignoring CR and treating any seen LF as NL.

I just don’t see why you’d not want to do that as the implementer. If there’s some way to exploit that behavior I can’t see it.

replies(1): >>41836805 #

18. tptacek ◴[14 Oct 24 00:24 UTC] No.41832974{4}[source]▶

>>41832905 #

I mean the safest thing would be to send an RST as soon as you see a SYN for 80/tcp.

replies(2): >>41833159 #>>41833686 #

19. tptacek ◴[14 Oct 24 00:24 UTC] No.41832977[source]▶

>>41832731 #

Yep, tested it locally, you're right; gotta CRLF to gunicorn.

20. RedShift1 ◴[14 Oct 24 00:59 UTC] No.41833159{5}[source]▶

>>41832974 #

Wouldn't not replying at all be the safest?

21. LegionMammal978 ◴[14 Oct 24 01:39 UTC] No.41833406{5}[source]▶

>>41832909 #

Make sure you're contacting hhs.gov and not www.hhs.gov, the www. subdomain reacts differently.

  $ printf 'GET / HTTP/1.1\r\nHost: hhs.gov\r\n\r\n' | nc hhs.gov 80
  HTTP/1.1 302 Found
  Date: Mon, 14 Oct 2024 01:38:29 GMT
  Server: Apache
  Location: http://www.hhs.gov/web/508//
  Content-Length: 212
  Content-Type: text/html; charset=iso-8859-1
  
  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  <html><head>
  <title>302 Found</title>
  </head><body>
  <h1>Found</h1>
  <p>The document has moved <a href="http://www.hhs.gov/web/508//">here</a>.</p>
  </body></html>
  ^C
  $ printf 'GET / HTTP/1.1\nHost: hhs.gov\n\n' | nc hhs.gov 80
  HTTP/1.1 400 Bad Request
  Date: Mon, 14 Oct 2024 01:38:40 GMT
  Server: Apache
  Content-Length: 226
  Connection: close
  Content-Type: text/html; charset=iso-8859-1
  
  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  <html><head>
  <title>400 Bad Request</title>
  </head><body>
  <h1>Bad Request</h1>
  <p>Your browser sent a request that this server could not understand.<br />
  </p>
  </body></html>

replies(1): >>41833624 #

22. tptacek ◴[14 Oct 24 02:18 UTC] No.41833624{6}[source]▶

>>41833406 #

Ahh, that was it, thanks.

replies(1): >>41837585 #

23. theamk ◴[14 Oct 24 02:31 UTC] No.41833686{5}[source]▶

>>41832974 #

That would have a severe downside of not letting your customers access your website.

Fast-abort on bare-0ah will still be compatible with all browsers and major http clients, thus providing extra mitigations practically for free.

24. MobiusHorizons ◴[14 Oct 24 03:32 UTC] No.41833940{3}[source]▶

>>41832201 #

If you expect to be behind a reverse proxy that manages internal headers for you (removes them on incoming requests, and adds them based on internal criteria) then accepting bare 0x0a newlines could be a security vulnerability, as a malicious request could sneak an internal header that would not be stripped by the reverse proxy.

replies(1): >>41842146 #

25. inopinatus ◴[14 Oct 24 05:48 UTC] No.41834573{3}[source]▶

>>41832201 #

That was already motivated by Postel's Law. It's a step beyond to change what the strict form is; relying on the same to justify unilaterally transposing the form is asking too much of middlebox implementations of just about any line-oriented protocol, and possible violates Postel's Law itself by asserting the inverse.

replies(1): >>41835014 #

26. tptacek ◴[14 Oct 24 07:13 UTC] No.41835014{4}[source]▶

>>41834573 #

I don't believe in Postel's Law, but I also don't believe in reverential adherence to standards documents. Make good engineering decisions on their own merits. This article is right: CRLF is dumb. You know who agrees with me about that? The IETF, in their (very old) informational RFC about the origins of CRLF in their protocols.

replies(1): >>41843876 #

27. immibis ◴[14 Oct 24 08:22 UTC] No.41835413{3}[source]▶

>>41831609 #

Security also doesn't exist as much as we'd like it to, which doesn't excuse making it exist even less.

28. account42 ◴[14 Oct 24 10:00 UTC] No.41835964[source]▶

>>41832137 #

> As the parent mentioned, it's security critical that every HTTP parser in the world - including every middleware, proxy, firewall, WAF - parses the headers in the same way. If you write a HTTP parser for a server application it's imperative you don't introduce random inconsistences with the standard (I can't believe I have to write this).

No it isn't, at least not critical to all those parsers. My HTTP server couln't care less if some middle boxes that people go through are less or more strict in their HTTP parsing. This only becomes a concern when you operate something like a reverse proxy AND implement security-relevant policies in that proxy.

29. immibis ◴[14 Oct 24 12:15 UTC] No.41836805{3}[source]▶

>>41832929 #

The exploit is that your request went through a proxy which followed the standard (but failed to reject the bare NL) and the client sent a header after a bare NL which you think came from the proxy but actually came from the client - such as the client's IP address in a fake X-Forwarded-For, which the proxy would have removed if it had parsed it as a header.

This attack is even worse when applied to SMTP because the attacker can forge emails that pass SPF checking, by inserting the end of one message and start of another. This can also be done in HTTP if your reverse proxy uses a single multiplexed connection to your origin server, and the attacker can make their response go to the next user and desync all responses after that.

replies(1): >>41843394 #

30. shadowgovt ◴[14 Oct 24 13:56 UTC] No.41837585{7}[source]▶

>>41833624 #

And this whole exercise is an example of why this is a non-starter proposal (at least the "change existing implementations" part).

How much do we expect the domain owners to invest in changing an implementation that already works? Hint: it's a number smaller than epsilon.

Google might, but their volume is so high they care about the cost of individual bytes on the wire.

replies(1): >>41838741 #

31. tptacek ◴[14 Oct 24 15:57 UTC] No.41838741{8}[source]▶

>>41837585 #

This exercise was about demonstrating that our security can't rely on making sure there's a carriage return in HTTP line termination, because there is no such norm. See the root of the thread, where I asked the question.

replies(1): >>41839176 #

32. shadowgovt ◴[14 Oct 24 16:38 UTC] No.41839176{9}[source]▶

>>41838741 #

Oh, I agree it's about that too, but my point is you've already volunteered more time and resources investigating the situation than most companies would be willing to spend.

33. Smar ◴[14 Oct 24 21:19 UTC] No.41842146{4}[source]▶

>>41833940 #

Only in the case the reverse proxy does not handle bare 0a newlines?

34. Aeolun ◴[14 Oct 24 23:39 UTC] No.41843394{4}[source]▶

>>41836805 #

Thanks, that was actually a very clear description of the problem!

The problem here is not to use one or the other, but to use a mix of both.

replies(1): >>41848471 #

35. inopinatus ◴[15 Oct 24 01:00 UTC] No.41843876{5}[source]▶

>>41835014 #

Yes, CRLF is dumb. Trying to justify the problem seems unnecessary, it's widely acknowledged. A productive inquiry looks at why fixing it didn't happen yet. Don't confuse that line of thought for calling for more failure.

This is unrealistic, though:

> I don't believe in Postel's Law

All the systems around us that work properly do believe in it, and they will continue to do so. No-one who writes MTAs or reverse proxies &c is gonna listen to the wolves howling at the moon for change when there's no better plan that "ram it through unilaterally". Irrespective of what any individual may believe, Postel's Law remains axiomatic in protocol design & implementation.

More constructively, it may be that line-oriented protocols will only move towards change when they can explicitly negotiate line termination preferences during the opening handshake/banner/key exchange etc, which inevitably means a protocol revision in every case and very careful consideration of when CRLF is passed through anyway (e.g. email body).

replies(1): >>41855135 #

36. immibis ◴[15 Oct 24 13:37 UTC] No.41848471{5}[source]▶

>>41843394 #

And the standard is CRLF, so you're either following the standard or using a mix.

37. tptacek ◴[16 Oct 24 02:49 UTC] No.41855135{6}[source]▶

>>41843876 #

Hold on: if you do believe in Postel's Law, you agree with me: just send newlines.

replies(1): >>41857490 #

38. ◴[16 Oct 24 10:27 UTC] No.41857490{7}[source]▶

>>41855135 #

↑