Most active commenters
  • tptacek(17)
  • michaelmior(7)
  • perching_aix(7)
  • (4)
  • LegionMammal978(4)
  • romwell(4)
  • theamk(3)
  • inopinatus(3)
  • immibis(3)

←back to thread

422 points km | 112 comments | | HN request time: 0.205s | source | bottom
1. michaelmior ◴[] No.41831072[source]
> various protocols (HTTP, SMTP, CSV) still "require" CRLF at the end of each line

What would be the benefit to updating legacy protocols to just use NL? You save a handful of bits at the expense of a lot of potential bugs. HTTP/1(.1) is mostly replaced by HTTP/2 and later by now anyway.

Sure, it makes sense not to require CRLF with any new protocols, but it doesn't seem worth updating legacy things.

> Even if an established protocol (HTTP, SMTP, CSV, FTP) technically requires CRLF as a line ending, do not comply.

I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

replies(13): >>41831206 #>>41831210 #>>41831225 #>>41831256 #>>41831322 #>>41831364 #>>41831391 #>>41831706 #>>41832337 #>>41832719 #>>41832751 #>>41834474 #>>41835444 #
2. javajosh ◴[] No.41831206[source]
>What would be the benefit...

It is interesting that you ignore the benefits the OP describes and instead present a vague and fearful characterization of the costs. Your reaction lies at the heart of cargo-culting, the maintenance of previous decisions out of sheer dread. One can do a cost-benefit analysis and decide what to do, or you can let your emotions decide. I suggest that the world is better off with the former approach. To wit, the OP notes for benefits " The extra CR serves no useful purpose. It is just a needless complication, a vexation to programmers, and a waste of bandwidth." and a mitigation of the costs "You need to search really, really hard to find a device or application that actually interprets U+000a as a true linefeed." You ignore both the benefits assertion and cost mitigating assertion entirely, which is strong evidence for your emotionality.

replies(4): >>41831368 #>>41831373 #>>41831410 #>>41831551 #
3. chasil ◴[] No.41831210[source]
FYI, Sendmail accepts LF without CR, but Exchange doesn't.
replies(2): >>41831846 #>>41835526 #
4. phkahler ◴[] No.41831225[source]
>> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

It's not satire and it's not just trying to make a point. It's trying to make things simpler. As he says, a lot of software will accept input without the CR already, even if it's supposed to be there. But we should change the standard over time so people in 2050 can stop writing code that's more complicated (by needing to eat CR) or inserts extra characters. And never mind the 2050 part, just do it today.

replies(2): >>41831535 #>>41831898 #
5. Ekaros ◴[] No.41831256[source]
Thinking about it. Using CR alone in protocols actually make infinitely more sense. As that would allow use of LF in records. Which would make many use cases much simpler.

Just think about text protocols like HTTP, how much easier something like cookies would be to parse if you had CR as terminating character. And then each record separated by LF.

replies(4): >>41831290 #>>41831369 #>>41831390 #>>41831465 #
6. gpvos ◴[] No.41831290[source]
That is so backwards incompatible that it is never, ever going to fly.
7. mechanicalpulse ◴[] No.41831322[source]
> Why intentionally introduce potential bugs for the sake of making a point?

It seems spiteful, but it strikes me as an interesting illustration of how the robustness principle could be hacked to force change. It’s a descriptivist versus prescriptivist view of standards, which is not how we typically view standards.

8. amluto ◴[] No.41831364[source]
> I'm hoping this is satire. Why intentionally introduce potential bugs for the sake of making a point?

It’s worse than satire. Postel’s Law is definitively wrong, at least in the context of network protocols, and delimiters, especially, MUST be precise. See, for example:

https://www.postfix.org/smtp-smuggling.html

Send exactly what the spec requires, and parse exactly as the spec requires. Do not accept garbage. And LF, where CRLF is specified, is garbage.

replies(2): >>41831386 #>>41831719 #
9. perching_aix ◴[] No.41831368[source]
> you ignore the benefits the OP describes

Funnily enough, the author doesn't actually describe any tangible benefits. It's all just (in my reading, semi-sarcastic) platonics:

- peace

- simplicity

- the flourishing of humanity

... so instead of "vague and fearful", the author comes on with a "vague and cheerful". Yay? The whole shtick about saving bandwidth, lessening complications, and reducing programmer vexations are only ever implied by the author, and were explicitly considered by the person you were replying to:

> You save a handful of bits at the expense of a lot of potential bugs.

... they just happened to be not super convinced.

Is this the kind of HackerNews comment I'm supposed to feel impressed by? That demonstrates this forum being so much better than others?

10. mattmerr ◴[] No.41831369[source]
ASCII already has designated bytes for unit, group, and record separators. That aside, a big drawback of using unprintable bytes like these is they're more difficult for humans to read in dumps or type on a keyboard than a newline (provided newline has a strict definition CRLF, LF, etc)
replies(1): >>41833693 #
11. YZF ◴[] No.41831373[source]
What's your estimate for the cost of changing legacy protocols that use CRLF vs. the work that will be done to support those?

My intuition (not emotion) agrees with the parent that investing in changing legacy code that works, and doesn't see a lot of churn, is likely a lot more expensive than leaving it be and focusing on new protocols that over time end up replacing the old protocols anyways.

OP does not really talk about the benefit, he just opines. How many programmers are vexed when implementing "HTTP, SMTP, CSV, FTP"? I'd argue not many programmers work on implementations of these protocols today. How much traffic is wasted by a few extra characters in these protocols? I'd argue almost nothing. Most of the bits are (binary, compressed) payload anyways. There is no analysis by OP of the cost of not complying with the standard which potentially results in breakage and the difficulty of being able to accurately estimate the breakage/blast radius of that lack of compliance. That just makes software less reliable and less predictable.

12. tptacek ◴[] No.41831386[source]
If two systems agree, independent of any specification someone somewhere else wrote, to accept a bare NL where a CRLF is specified, that is not "garbage". Standards documents are not laws; the horse drags the cart.
replies(4): >>41831436 #>>41831513 #>>41831973 #>>41835480 #
13. ◴[] No.41831390[source]
14. FiloSottile ◴[] No.41831391[source]
Exactly. Please DO NOT mess with protocols, especially legacy critical protocols based on in-band signaling.

HTTP/1.1 was regrettably but irreversibly designed with security-critical parser alignment requirements. If two implementations disagree on whether `A:B\nC:D` contains a value for C, you can build a request smuggling gadget, leading to significant attacks. We live in a post-Postel world, only ever generate and accept CRLF in protocols that specify it, however legacy and nonsensical it might be.

(I am a massive, massive SQLite fan, but this is giving me pause about using other software by the same author, at least when networks are involved.)

replies(7): >>41831450 #>>41831498 #>>41831871 #>>41832546 #>>41832632 #>>41832661 #>>41839309 #
15. LegionMammal978 ◴[] No.41831410[source]
The cost is, if people start transitioning to a world where senders only transmit LF in opposition to current standards for protocols like HTTP/1.1 or SMTP (especially aggressively, e.g., by creating popular HTTP libraries without a CRLF option), then it will create the mental and procedural overhead of tracking which receivers accept LF alone vs. which still require CRLF. Switching established protocols is never free, even when there are definite benefits: see the Python 2-to-3 fiasco, caused by newer programs being incompatible with most older libraries.
replies(1): >>41834703 #
16. DaiPlusPlus ◴[] No.41831436{3}[source]
> Standards documents are not laws; the horse drags the cart.

They can be: c.f. legally-enforced safety-regulations.

replies(1): >>41831454 #
17. tptacek ◴[] No.41831450[source]
This would be more persuasive if HTTP servers didn't already widely accept bare 0ah line termination. What's the first major public web site you can find that doesn't?
replies(5): >>41831506 #>>41831717 #>>41832137 #>>41832555 #>>41832731 #
18. tptacek ◴[] No.41831454{4}[source]
These aren't.
19. ◴[] No.41831465[source]
20. refulgentis ◴[] No.41831498[source]
I wouldn't be too worried and making personal judgements, he says the same thing you are (though I assume you disagree)
21. michaelmior ◴[] No.41831506{3}[source]
We're talking about servers and clients here. The best way to ensure things work is to adhere to an established protocol. Aside from saving a few bytes, there doesn't seem to be any good reason to deviate.
replies(3): >>41831609 #>>41831637 #>>41832929 #
22. perching_aix ◴[] No.41831513{3}[source]
Laws are also just some ink on paper (and are routinely overruled, circumvented or unenforced in certain jurisdictions), so using this kind of logic in order to encourage standard violations is unsound.

There is a method to this madness, and that's revising the standards.

replies(1): >>41831619 #
23. michaelmior ◴[] No.41831535[source]
Ignoring established protocols doesn't make things simpler. It makes things vastly more complicated.

Let's absolutely fix new protocols (or new versions of existing protocols). But intentionally breaking existing protocols doesn't simplify anything.

24. michaelmior ◴[] No.41831551[source]
You're right that I didn't mention the supposed benefits in my response. But let's incorporate those benefits into new protocols rather than break existing protocols. I just don't see the benefit in intentionally breaking existing protocols.
25. tptacek ◴[] No.41831609{4}[source]
I'm saying the consistency that Filippo says our security depends on doesn't really seem to exist in the world, which hurts the persuasiveness of that particular argument in favor of consistency.
replies(2): >>41831837 #>>41835413 #
26. tptacek ◴[] No.41831619{4}[source]
What's a "standard violation"? The original history of the IETF is a rejection of exactly this mode of thinking about the inviolability of standards, which was the ethos of the OSI.
replies(2): >>41831646 #>>41831874 #
27. Ekaros ◴[] No.41831637{4}[source]
There is very good reasons not to deviate as mismatch in various other things that can or are not on the path can affect things. Like reverse proxies, load balancers and so on.
28. perching_aix ◴[] No.41831646{5}[source]
When an implementation is noncomformant to a standard in question.
replies(2): >>41831669 #>>41832571 #
29. tptacek ◴[] No.41831669{6}[source]
IETF standards are tools to help developers get stuff done on the Internet. They are not the only tool, and they don't carry any moral force.
replies(1): >>41831797 #
30. halter73 ◴[] No.41831706[source]
> I'm hoping this is satire.

Me too. It's one thing to accept single LFs in protocols that expect CRLF, but sending single LFs is a bridge to far in my opinion. I'm really surprised most of the other replies to your comment currently seem to unironically support not complying with well-established protocol specifications under the misguided notion that it will somehow make things "simpler" or "easier" for developers.

I work on Kestrel which is an HTTP server for ASP.NET Core. Kestrel didn't support LF without a CR in HTTP/1.1 request headers until .NET 7 [1]. Thankfully, I'm unaware of any widely used HTTP client that even supports sending HTTP/1.1 requests without CRLF header endings, but we did eventually get reports of custom clients that used only LFs to terminate headers.

I admit that we should have recognized a single LF as a line terminator instead of just CRLF from the beginning like the spec suggests, but people using just LF instead of CRLF in their custom clients certainly did not make things any simpler or easier for me as an HTTP server developer. Initially, we wanted to be as strict as possible when parsing request headers to avoid possible HTTP request smuggling attacks. I don't think allowing LF termination really allows for smuggling, but it is something we had to consider.

I do not support even adding the option to terminate HTTP/1.1 request/response headers with single LFs in HttpClient/Kestrel. That's just asking for problems because it's so uncommon. There are clients and servers out there that will reject headers with single LFs while they all support CRLF. And if HTTP/1.1 is still being used in 2050 (which seems like a safe bet), I guarantee most clients and servers will still use CRLF header endings. Having multiple ways to represent the exact same thing does not make a protocol simpler or easier.

[1]: https://github.com/dotnet/aspnetcore/pull/43202

replies(1): >>41832436 #
31. FiloSottile ◴[] No.41831717{3}[source]
Hrm, this is what I get for logging in to HN from my phone. It’s possible I am confusing this with one of the other exploitable HTTP/1.1 header parser alignment issues.

Maybe this was so widespread that ~everything already handles it because non-malicious stuff breaks if you don’t. In that case, my bad, but I still would like to make a general plea as an implementer for sticking strictly to specified behavior in this sort of protocols.

32. ◴[] No.41831719[source]
33. perching_aix ◴[] No.41831797{7}[source]
Apart from colloquially considering standards not-necessarily-normative being, in my opinion, nonsensical (see below), to the best of my knowledge at the very least the STD subseries of IETF standards documents are normative in nature: https://datatracker.ietf.org/doc/std

> They are not the only tool, and they don't carry any moral force.

Indeed there are countless other standards bodies in the world also producing normative definitions for many things, so I'm definitely a bit confused why the focus on IETF specifically.

To be even more exact, I do not know of any standards bodies who would publish what they and the world consider as standards, that would be entirely, or at least primarily, informational rather than normative in nature. Like, do I know the word "standard" incorrectly? What even is a point of a standard, if it doesn't aim to control?

replies(1): >>41832287 #
34. dwattttt ◴[] No.41831837{5}[source]
But no one expects 0ah to be sufficient. Change that expectation, and now you have to wonder if your middleware and your backend agree on whether the middleware filtered out internal-only headers.
replies(1): >>41831921 #
35. 9dev ◴[] No.41831846[source]
…how very in character for each of them!
36. mackal ◴[] No.41831871[source]
> massive SQLite fan, but this is giving me pause about using other software by the same author

Even if I wanted to contribute code to SQLite, I can't. I acknowledge the fact God doesn't exist, so he doesn't want my contributions :P

replies(1): >>41832762 #
37. nsnshsuejeb ◴[] No.41831874{5}[source]
Elephant in the room is the trillions of actual servers and user agents that would need to be tested and patched if you retroactively change a standard. Luckily there are some digits after HTTP that allow the concept of new versions of the standard.
38. nsnshsuejeb ◴[] No.41831898[source]
Yes. We all know how to do this. You know that API version thingy. I agree to drop the carriage return when not needed but do it in future protocols.

Obviously IPv6 shows you need to be patient. Your great grandkids may see a useless carriage return!

Windows doesn't help here.

replies(1): >>41832062 #
39. tptacek ◴[] No.41831921{6}[source]
Yeah, I'm not certain that this is a real issue. It might be? Certainly, I'm read in to things like TECL desync. I get the concern, that any disagreement in parsing policies is problematic for HTTP because of middleboxes. But I think the ship may have sailed on 0ah, and that it may be the case that you simply have to build HTTP systems to be bare-0ah-tolerant if you want your system to be resilient.
replies(1): >>41832774 #
40. djbusby ◴[] No.41831973{3}[source]
That's just two systems that happen to agree on garbage.
replies(1): >>41836880 #
41. perching_aix ◴[] No.41832062{3}[source]
Versioning provides people with capability for change management, but won't perform it on their behalf. Who knew.
42. hifromwork ◴[] No.41832137{3}[source]
As the parent mentioned, it's security critical that every HTTP parser in the world - including every middleware, proxy, firewall, WAF - parses the headers in the same way. If you write a HTTP parser for a server application it's imperative you don't introduce random inconsistences with the standard (I can't believe I have to write this).

On the other hand, as a client, it's OK to send malformed requests, as long as you're prepared that they may fail. But it's a weird flex, legacy protocols have many warts, why die on this particular hill.

replies(2): >>41832201 #>>41835964 #
43. tptacek ◴[] No.41832201{4}[source]
That appears to be an argument in favor of accepting bare-0ah, since as a positive statement that is the situation on the Internet today.
replies(3): >>41832905 #>>41833940 #>>41834573 #
44. tptacek ◴[] No.41832287{8}[source]
Ok, but just to be clear: the standards-track HTTP RFC says you can use a single LF. I don't think this issue is as clear as people seem to want it to be.
replies(4): >>41832388 #>>41832537 #>>41832566 #>>41832954 #
45. nedt ◴[] No.41832337[source]
> What would be the benefit

Easy - being able to use a plain text protocol as a human being without having to worry if my terminal sends the right end of line terminator. Using netcat to debug SMTP issues is actually something I do often enough.

46. perching_aix ◴[] No.41832388{9}[source]
Ah, this is a subthread about HTTP specifically - didn't notice. Explains why you focused on the IETF too. Nevertheless, my points I believe still all stand.

As for HTTP or any other protocols' definitions go, I'd rather not join in on that back and forth. I'd imagine it's well defined what's expected. Skim reading RFC-2616 now certainly suggests so.

47. jfengel ◴[] No.41832436[source]
LF only? Huh.

In its original terms for printing terminals, carriage return might be ambiguous. It could means either "just send the print head to column zero" or "print head to 0 and advance the line by one". The latter is what typewriters do for the Return key.

But LF always meant Line Feed, moving the paper but not the print head.

These are of course wildly out of date concepts. But it still strikes me as odd to see a Line Feed as a context reset.

replies(1): >>41832569 #
48. vitus ◴[] No.41832537{9}[source]
Sure. HTTP/1.1 isn't the only network protocol, though, IETF standardization or otherwise.

For SMTP (which this subthread started with):

   In addition, the appearance of "bare" "CR" or "LF" characters in text
   (i.e., either without the other) has a long history of causing
   problems in mail implementations and applications that use the mail
   system as a tool.  SMTP client implementations MUST NOT transmit
   these characters except when they are intended as line terminators
   and then MUST, as indicated above, transmit them only as a <CRLF>
   sequence.
https://datatracker.ietf.org/doc/html/rfc5321#section-2.3.8
49. Spooky23 ◴[] No.41832546[source]
What a weird reaction. Microsoft’s use of CRLF is an archaic pain in the ass. Taking a position that it should be deprecated isn’t radical or irresponsible — Microsoft makes gratuitous changes to things all of the time, why not this one?

Hipp is probably one of the better engineering leaders out there. His point of view carries weight because of who he is, but should be evaluated on its merits. If Microsoft got rid of this crap 30 years ago, when it was equally obsolete, we wouldn’t be having this conversation; if nobody does, our grandchildren will.

replies(4): >>41832890 #>>41833658 #>>41836490 #>>41837496 #
50. LegionMammal978 ◴[] No.41832555{3}[source]
Going down a list of top websites, these URLs respond with HTTP 200 (possibly after redirections) when sent an ordinary HTTP/1.1 GET request with 0D0A line endings, but respond with HTTP 400 when sent the exact same request with 0A line endings:

  https://br.pinterest.com/ https://www.pinterest.co.uk/
  https://apps.apple.com/ https://support.apple.com/ https://podcasts.apple.com/ https://music.apple.com/ https://geo.itunes.apple.com/
  https://ncbi.nlm.nih.gov/ https://www.salesforce.com/ https://www.purdue.edu/ https://www.playstation.com/
  https://llvm.org/ https://www.iana.org/ https://www.gnu.org/ https://epa.gov/ https://justice.gov/
  https://www.brendangregg.com/ http://heise.de/ https://www.post.ch/ http://hhs.gov/ https://oreilly.com/
  https://www.thinkgeek.com/ https://www.constantcontact.com/ https://sciencemag.org/ https://nps.gov/
  https://www.cs.mun.ca/ https://www.wipo.int/ https://www.unicode.org/ https://economictimes.indiatimes.com/
  https://science.org/ https://icann.org/ https://caniuse.com/ https://w3techs.com/ https://chrisharrison.net/
  https://www.universal-music.co.jp/ https://digiland.libero.it/ https://webaim.org/ https://webmd.com/
This URL responds with HTTP 505 on an 0A request:

  https://ed.ted.com/
These URLs don't respond on an 0A request:

  https://quora.com/
  https://www.nist.gov/
Most of these seem pretty major to me. There are other sites that are public but responded with an HTTP 403, probably because they didn't like the VPN or HTTP client I used for this test. (Also, www.apple.com is tolerant of 0A line endings, even though its other subdomains aren't, which is weird.)
replies(1): >>41832634 #
51. halter73 ◴[] No.41832566{9}[source]
Can you provide a citation for this? I’ve read older RFCs that "recommend" recipients allow single LFs to terminate headers for robustness. I’ve also read newer RFCs that weaken that recommendation and merely say the recipient "MAY" allow single LFs. I’ve never noticed an HTTP RFC say you can send headers without the full CRLF sequence, but maybe I missed something.

https://datatracker.ietf.org/doc/html/rfc2616#section-19.3 https://datatracker.ietf.org/doc/html/rfc9112#section-2.2

52. romwell ◴[] No.41832569{3}[source]
>The latter is what typewriters do for the Return key.

Minor correction: mechanical typewriters do not have a Return key, but they have both operations (line feed, as well as carriage return).

The carriage return lever is typically rigged to also do line feed at the same time, by a preset amount of lines (which can be set to 0), or you can push the carriage without engaging line feed.

Technically, the lever would do LF, and pushing on it further would do CR (tensioning the carriage spring).

It is, however, true that most of the time, the users would simply push the lever until it stops without thinking about it, producing CRLF operation —

— and that CR without LF was comparatively rare.

From a pure protocol UX perspective, it would make sense IMO to have a single command for (CR + LF) too, just like the typewriter effectively does it (push the lever here to do both at once).

It seems weird that the protocol is more limited than the mechanical device that it drives, but then again, designers probably weren't involved in deciding on terminal protocol specs.

replies(3): >>41833620 #>>41834015 #>>41835511 #
53. sophacles ◴[] No.41832571{6}[source]
I've implemented a lot of protocols. Most implementations I've come across for most protocols not strictly standards conformant, for many reasons.

Big ones being:

* The standards are often not detailed enough, or contain enough loose verbage that there are many ways to understand how to implement some part, yet those ways are not interoperable.

* Many protocols allow vendor specifications in such a way that 2 implementations that are 100% compliant won't interoperate.

* Many protocol implementations are interoperable quite well, converging on behavior that isn't specified in any standard (often to the surprise of people who haven't read the relevant standards)

At least this is my experience for ietf rfc standards.

replies(1): >>41833221 #
54. pdw ◴[] No.41832632[source]
HTTP is saved here because headers aren't allowed to contain control characters. A server that is strict enough to only recognize CRLF will hopefully also be strict enough to reject requests that contain invalid characters.

The situation is different with SMTP, see https://www.postfix.org/smtp-smuggling.html

replies(1): >>41833636 #
55. tptacek ◴[] No.41832634{4}[source]
You sure about this? www.pinterest.com, for instance, does not appear to care whether I 0d0a or just 0a.
replies(1): >>41832865 #
56. rtpg ◴[] No.41832661[source]
Took me a second to get what was going on here, but basically the idea is that you middleware might not see `C:D`, but then your application _does_ see `C:D`.

And given your application might assume your middleware does some form of access control (for example, `X-ActualUserForReal` being treated as an internal-only header), you could get around some access control stuff.

Not a bytes-alignment thing but a "header values disagreement" thing.

This is an issue if one part of your stack parses headers differently than another in general though, not limited to newlines.

57. cassepipe ◴[] No.41832719[source]
It seems to me the author is not suggesting to update the protocols themselves but rather to stop sending them CR even if the spec requires it. And to patch the corresponding software to it accepts simple newlines.
58. rtpg ◴[] No.41832731{3}[source]
Gunicorn expects `\r\n` for lines (see gunicorn/http/message.py:read_line), though it's possible that every middleware that is in front of gunicorn in practice normalizes lines to avoid this issue.
replies(1): >>41832977 #
59. tfehring ◴[] No.41832751[source]
At least for CSV, there's a divergence between usage in practice and the spec. The spec requires CRLF, but all of the commonly used tools I've encountered for reading and writing CSVs can read files with CR, LF, or CRLF line endings, and when writing CSVs they'll default to either LF or platform-specific line endings. (Even Excel for Mac doesn't default to CRLF!) I think this divergence is bad and should be fixed.

But IMO the right resolution is to update the spec so that (1) readers MUST accept any of (CR, LF, CRLF), (2) writers MUST use one of (CR, LF, CRLF), and (3) writers SHOULD use LF. Removing compatibility from existing applications to break legacy code would be asinine.

replies(1): >>41836516 #
60. somat ◴[] No.41832762{3}[source]
He does not want your code anyway, sqlite is public domain. this has several implications. One of which is the author wants nothing from you. Note that public domain is fundamentally different than the usual method of releasing code, which is to issue a license to distribute a copyright protected work. Putting a thing into the public domain is to renounce any ownership over the thing.

I think that the proper spirit of the thing is that if you have patches to sqlite is to just maintain them yourself. if you are especially benevolent you will put the patches in the public domain as well. and if they are any good perhaps the original author will want them.

In fact the public domain is so weird, some countries have no legal understanding of it. originally the concept was just the stance of the US federal government that because the works of the government were for the people, these works were not protected by copyright, and could be thought of as collectively owned by the people, or in the public domain. Some countries don't recognize this. everything has to be owned by someone. and sqlite was legally unable to be distributed in these countries, it would default to copyright with no license.

61. dwattttt ◴[] No.41832774{7}[source]
But what's bare-0ah-tolerant? Accepting _or_ ignoring bare 0ah's means you need to ensure all your moving parts agree, or you end up in the "one bit thinks this is two headers, others think it's one header".

The only situation where you don't need to know two policies match is when one of the policies rejects one of the combinations outright. Probably. Maybe.

EDIT: maybe it's better phrased as "all parts need to be bare-0ah-strict". But then it's fine if it's bare-0ah-reject; they just need to all be strict, one way or the other.

62. LegionMammal978 ◴[] No.41832865{5}[source]
My apologies, I was using a client which kept the connection alive between the 0D0A and 0A requests, which has an effect on www.pinterest.com. Rerunning the test with separate connections for 0D0A and 0A requests, www.pinterest.com and phys.org are no longer affected (I've removed the two from the list), but all other URLs are still affected.
replies(1): >>41832909 #
63. theamk ◴[] No.41832890{3}[source]
No one is talking about Microsoft and whatever it does on its platform, the parent comment is about network protocols (HTTP, SMTP and so on..).

I understand that it is tempting to blame Microsoft for \r\n proliferation, but it does not seem to be the case - the \r\n is comes from the era of teletypes and physical VT terminals. You can still see the original "NL" in action (move down only, do not go back to start of line) on any Unix system by typing "(stty raw; ls)" in a throw-away terminal.

replies(1): >>41844264 #
64. theamk ◴[] No.41832905{5}[source]
Wouldn't the safest thing, security-wise, to fail fast on bare 0ah?

As a web server, you may not know which intermediate proxies did the request traverse before arriving to your port. Given that request smuggling is a thing, failing fast with no further parsing on any protocol deviations seems to be the most secure thing.

replies(1): >>41832974 #
65. tptacek ◴[] No.41832909{6}[source]
I picked one at random --- hhs.gov --- and it too appears to work?

For what it's worth: I'm testing by piping the bytes for a bare-newline HTTP request directly into netcat.

replies(1): >>41833406 #
66. Aeolun ◴[] No.41832929{4}[source]
Well, you can achieve the desired behavior in all situations by ignoring CR and treating any seen LF as NL.

I just don’t see why you’d not want to do that as the implementer. If there’s some way to exploit that behavior I can’t see it.

replies(1): >>41836805 #
67. convolvatron ◴[] No.41832954{9}[source]
none of this is as clear as anyone wants it to be. if standards _could_ be completely formally described, it would be an entirely different world. I did quite a bit of work implementing draft standards in the IETF, and and the end of the day the standard is the best we can make it, but for non-trivial things good luck actually implementing it without something to test against or a reference implementation.

thats the context in which Postel's law make absolute sense. not that you should forgo any sanity checking, or attempt to interpret garbage or make up frame boundaries. but when there is a potential ambiguity, and you can safely tolerate it, then its really helpful for you to do so.

68. tptacek ◴[] No.41832974{6}[source]
I mean the safest thing would be to send an RST as soon as you see a SYN for 80/tcp.
replies(2): >>41833159 #>>41833686 #
69. tptacek ◴[] No.41832977{4}[source]
Yep, tested it locally, you're right; gotta CRLF to gunicorn.
70. RedShift1 ◴[] No.41833159{7}[source]
Wouldn't not replying at all be the safest?
71. perching_aix ◴[] No.41833221{7}[source]
I'm aware of these factors, wasn't trying to suggest that the practice doesn't differ from the theory. What I was more going for was to highlight that the goal should be to primarily try and have these eventually converge, preferably sooner than later, not trying to strongarm the practice side and wait for the standards body in question to wake up one day and decide to amend the standard. That might give the impression of suddenness, but the core issue remains unsolved that way.

Usually when there's a high disparity between the "de jure" and the "de facto", it's due to a discrepancy in the interests and the leverage, resulting in a breakdown in communication and cooperation. Laying into either then is a bandaid attempt, not a solution. It's how either standard sprawl starts, or how standards bodies lose relevance.

72. LegionMammal978 ◴[] No.41833406{7}[source]
Make sure you're contacting hhs.gov and not www.hhs.gov, the www. subdomain reacts differently.

  $ printf 'GET / HTTP/1.1\r\nHost: hhs.gov\r\n\r\n' | nc hhs.gov 80
  HTTP/1.1 302 Found
  Date: Mon, 14 Oct 2024 01:38:29 GMT
  Server: Apache
  Location: http://www.hhs.gov/web/508//
  Content-Length: 212
  Content-Type: text/html; charset=iso-8859-1
  
  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  <html><head>
  <title>302 Found</title>
  </head><body>
  <h1>Found</h1>
  <p>The document has moved <a href="http://www.hhs.gov/web/508//">here</a>.</p>
  </body></html>
  ^C
  $ printf 'GET / HTTP/1.1\nHost: hhs.gov\n\n' | nc hhs.gov 80
  HTTP/1.1 400 Bad Request
  Date: Mon, 14 Oct 2024 01:38:40 GMT
  Server: Apache
  Content-Length: 226
  Connection: close
  Content-Type: text/html; charset=iso-8859-1
  
  <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
  <html><head>
  <title>400 Bad Request</title>
  </head><body>
  <h1>Bad Request</h1>
  <p>Your browser sent a request that this server could not understand.<br />
  </p>
  </body></html>
replies(1): >>41833624 #
73. fijiaarone ◴[] No.41833620{4}[source]
On manual typewriters there is a lever that turns the roller to accomplish a line feed (or two if set for double space.) This lever is usually located on the left side of the carriage to make it convenient to push it back to the right side in the same motion.
replies(1): >>41833826 #
74. tptacek ◴[] No.41833624{8}[source]
Ahh, that was it, thanks.
replies(1): >>41837585 #
75. kragen ◴[] No.41833636{3}[source]
Hopefully is not a good word to see in a argument that a software proposal is secure.

Myself, I've written an HTTP server that is strict enough to only recognize CRLF, because recognizing bare CR or LF would require more code†, but it doesn't reject requests that contain invalid characters. It wouldn't open a request-header-smuggling hole in my case because it doesn't have any proxy functionality.

One server is a small sample size, and I don't remember what the other HTTP servers I've written do in this case.

______

http://canonical.org/~kragen/sw/dev3/httpdito-readme http://canonical.org/~kragen/sw/dev3/server.s

76. naikrovek ◴[] No.41833658{3}[source]
CRLF was the correct way to implement a new line the way we think of it now, because teletypes and typewriters considered the “return to the 0th column” and “go to the next line” to be different things that are each valid on their own.

CRLF was the standardized way to implement “go down one line and return to column zero” and they’re the only ones who implemented new lines correctly at the outset.

Blaming Microsoft now, because they like backwards compatibility above almost everything else, is misplaced and myopic.

replies(1): >>41837161 #
77. theamk ◴[] No.41833686{7}[source]
That would have a severe downside of not letting your customers access your website.

Fast-abort on bare-0ah will still be compatible with all browsers and major http clients, thus providing extra mitigations practically for free.

78. fijiaarone ◴[] No.41833693{3}[source]
There is no reason those ascii characters need to stay unprintable. You could use other characters like an interpunct, silcrow, or down carat.
replies(1): >>41835469 #
79. romwell ◴[] No.41833826{5}[source]
Isn't this what I said?

>the lever would do LF, and pushing on it further would do CR (tensioning the carriage spring).

In any case, carriage return is just as important function of the lever as line feed:

- you can also directly do line feed by turning the roller

- line feed, by itself, doesn't need a large lever

- carriage return, by itself, doesn't need a large lever either - you can simply push the carriage

- however, having a large lever is an ergonomic feature which allows you to:

1) return the carriage without moving your hands too far from the keyboard

2) do CRLF in one motion without it feeling like two things

3) If needs be, do a line feed by itself, since the force required for that is much smaller compared to the one to move the carriage (lever advantage!).

The long lever makes it so that line feed happens before carriage return. If the lever were short, you'd be able to move the carriage until it stops, and only then would the paper move up.

So I wondered why the control codes are doing the operations in the opposite order from the typewriter.

Turns out, the reasons are mechanical[1]:

>The separation of newline into two functions concealed the fact that the print head could not return from the far right to the beginning of the next line in time to print the next character. Any character printed after a CR would often print as a smudge in the middle of the page while the print head was still moving the carriage back to the first position. "The solution was to make the newline two characters: CR to move the carriage to column one, and LF to move the paper up.

Aha! Makes sense.

In a way, this was creating a de-facto protocol by usage, in a similar spirit the the author is suggesting to get rid of it.

As in: the existing standard wasn't really supported, but letting the commands go through nevertheless and allowing things to break incentivized people to collectively stick to the way of doing things that didn't result in misprints.

____ [1] https://en.wikipedia.org/wiki/Newline

80. MobiusHorizons ◴[] No.41833940{5}[source]
If you expect to be behind a reverse proxy that manages internal headers for you (removes them on incoming requests, and adds them based on internal criteria) then accepting bare 0x0a newlines could be a security vulnerability, as a malicious request could sneak an internal header that would not be stripped by the reverse proxy.
replies(1): >>41842146 #
81. Izkata ◴[] No.41834015{4}[source]
They didn't say "mechanical typewriters", just "typewriters". Electric typewriters did have a Return key that did work the way they described.
replies(1): >>41834839 #
82. inopinatus ◴[] No.41834474[source]
Not just potential bugs, there'll be definite security failures.

Changing the line endings can invalidate signatures over plaintext content. So an email MTA, for example, could never do so. Nor most proxy implementations. Then there's the high latent potential for request smuggling, command injection, and privilege escalation, via careful crafting of ambiguous header lines or protocol commands that target less robust implementations. With some protocols, it may cause declared content sizes to be incorrect, leading to bizarre hangs, which is to say, another attack surface.

In practice, retiring CRLF can't be safely performed unilaterally or by fiat, we'll need to devise a whole new handshake to affirm that both ends are on the same page re. newline semantics.

83. inopinatus ◴[] No.41834573{5}[source]
That was already motivated by Postel's Law. It's a step beyond to change what the strict form is; relying on the same to justify unilaterally transposing the form is asking too much of middlebox implementations of just about any line-oriented protocol, and possible violates Postel's Law itself by asserting the inverse.
replies(1): >>41835014 #
84. bvrmn ◴[] No.41834703{3}[source]
2-to-3 fiasco was solely caused by inadequate support to write py2 compatible code until python 3.4. It was literally "you devs, stop write ugly py2, let's write godly py3".
85. romwell ◴[] No.41834839{5}[source]
Indeed, I was clarifying that:

1) "Typewriters" in parent's comment didn't refer to mechanical typewriters, but

2) Line feed/carriage return semantics, as well as the UX of combining them into one action to proceed to the next line of text, predate electric typewriters and were effectively the same on mechanical ones.

As I wrote in the other comment, the subtle difference in semantics comes from teletypes, which couldn't advance the paper feed and return the carriage fast enough to print the next character in the timespan of one command.

Not that it applied to all teletypes, but it was the case for a very popular unit.

The makers of that machine deliberately didn't include a single command that would do CR/LF so that there'd be no way for the users to notice that.

The ordering, CR then LF, differs from the one on mechanical typewriters, where LF always precedes CR when you use the big lever, allowing one to use the same lever to produce blank lines without moving the carriage (in effect, doing LF LF LF ... LF CR).

On the teletypes though, CR LF ordering was, in any case, a lie, since in actuality, LF was happening somewhere in the middle of the carriage return, which took the time span of two commands.

The CR command had to precede LF on the teletype because it took longer, but since the mechanisms were independent, they could be executed at the same time.

This is the difference from mechanical typewriters.

Typing mechanism was also independent of CR and LF, and running CR + [type character] at the same time was bad. But having fixed-time-per-command simplified everything, so instead of waiting (..which means buffering - with potential overflow issues - or a protocol to tell the sending party to wait, which is a lot more complex), hacks like this were put in place.

My IBM selectric is not functional (got it as a repair project, didn't get to do it yet), so I can't verify, but I'd guess it doesn't need to do CR then LF, since it can simply not process input while the carriage is returning. It's OK for it to do CR and LF in any order, or simultaneously.

If the operator presses and releases a button during this time, the machine can simply do nothing; the operator will re-type the character the next instant, using the buffer in their head where the text ultimately comes from.

The teletypes didn't have that luxury, as the operator on the other end could be a computer, which was told it could send output at a certain rate, and by golly it did. Not processing a command would mean dropping data.

All that is to say that CR and LF are present on both typewriters and teletypes, with the following differences:

* mechanical typewriters always do LFCR due the mechanics of the carriage return lever, which was designed for a human operator;

* teletypes do CRLF because that's how they cope with the typist being a machine that can't be told to wait a bit until the carriage returns;

* and electric typewriters are somewhere in betwen and could do whatever, because the CR lever was replaced by the motor (like on a teletype), but the operator was still a human that could wait half a second without forgetting what it is that they wanted to type.

IMO, it's worth keeping CRLF around simply because it's a part of computer and technology history that spans nearly two centuries, from typewriters to Google docs.

86. tptacek ◴[] No.41835014{6}[source]
I don't believe in Postel's Law, but I also don't believe in reverential adherence to standards documents. Make good engineering decisions on their own merits. This article is right: CRLF is dumb. You know who agrees with me about that? The IETF, in their (very old) informational RFC about the origins of CRLF in their protocols.
replies(1): >>41843876 #
87. immibis ◴[] No.41835413{5}[source]
Security also doesn't exist as much as we'd like it to, which doesn't excuse making it exist even less.
88. jcul ◴[] No.41835444[source]
Not disagreeing with you, but implementation diverges from spec a lot anyway.

I've had to write decoders for things like HTTP, SMTP, SIP (VoIP), and there's so many edge cases and undocumented behavior from different implementations that you have to still support.

I find that it affects text based protocols, a lot more than binary protocols. Like TLS, or RTP, to stick with the examples above, have much less divergence and are much less forgiving to broken (according to spec) implementations.

replies(1): >>41836502 #
89. eqvinox ◴[] No.41835469{4}[source]
There is in fact a reason those ASCII 'characters' should stay unprintable: the 0x00-0x1f (except Tab, CR, LF) range is explicitly excluded as invalid in a whole bunch of standards, e.g. XML.
90. Joker_vD ◴[] No.41835480{3}[source]
This is, by the way, exactly the stance the Microsoft used to have in the 90-s and 00-s on the standards (it probably still has). And MS caught a lot of flak for that, for a very good reason.
91. Joker_vD ◴[] No.41835511{4}[source]
Naked CR is an almost (baring legacy Mac OS) universally supported cross-platform way to print progress bars on CRT terminals.
replies(1): >>41854139 #
92. isThereClarity ◴[] No.41835526[source]
sendmail 8.18.1 includes patches to correct this behaviour (and options to turn it back on) due to its role in SMTP smuggling, CVE-2023-51765. See https://ftp.sendmail.org/RELEASE_NOTES

  sendmail is now stricter in following the RFCs and rejects
  some invalid input with respect to line endings
  and pipelining:
  ...snip...
  - Accept only CRLF . CRLF as end of an SMTP message
    as required by the RFCs, which can disabled by the
    new srv_features option 'O'.
  - Do not accept a CR or LF except in the combination
    CRLF (as required by the RFCs).  These checks can
    be disabled by the new srv_features options
   'U' and 'G', respectively.  In this case it is
   suggested to use 'u2' and 'g2' instead so the server
   replaces offending bare CR or bare LF with a space.
   It is recommended to only turn these protections off
   for trusted networks due to the potential for abuse.
93. account42 ◴[] No.41835964{4}[source]
> As the parent mentioned, it's security critical that every HTTP parser in the world - including every middleware, proxy, firewall, WAF - parses the headers in the same way. If you write a HTTP parser for a server application it's imperative you don't introduce random inconsistences with the standard (I can't believe I have to write this).

No it isn't, at least not critical to all those parsers. My HTTP server couln't care less if some middle boxes that people go through are less or more strict in their HTTP parsing. This only becomes a concern when you operate something like a reverse proxy AND implement security-relevant policies in that proxy.

94. michaelmior ◴[] No.41836490{3}[source]
I didn't say we shouldn't get rid of it. I'm saying we shouldn't intentionally break existing protocols.
95. michaelmior ◴[] No.41836502[source]
That's fair, but I don't see that as an argument for intentionally deviating from the spec.
96. michaelmior ◴[] No.41836516[source]
For CSV, "breaking" changes seem like less of a big deal to me. Partially because there is already so much variation in how CSV is implemented.
97. immibis ◴[] No.41836805{5}[source]
The exploit is that your request went through a proxy which followed the standard (but failed to reject the bare NL) and the client sent a header after a bare NL which you think came from the proxy but actually came from the client - such as the client's IP address in a fake X-Forwarded-For, which the proxy would have removed if it had parsed it as a header.

This attack is even worse when applied to SMTP because the attacker can forge emails that pass SPF checking, by inserting the end of one message and start of another. This can also be done in HTTP if your reverse proxy uses a single multiplexed connection to your origin server, and the attacker can make their response go to the next user and desync all responses after that.

replies(1): >>41843394 #
98. throwaway19972 ◴[] No.41836880{4}[source]
At that point, what does garbage even mean? There's just functional software and non-functional software.
99. 0points ◴[] No.41837161{4}[source]
Additionally it is also dishonest to bring Microsoft into the discussion like that. The discussion revolved around _standardized_ network protocols, which is entirely unrelated to MS-DOS text formats.
100. vrighter ◴[] No.41837496{3}[source]
He's not arguing for deprecating it. He's arguing for just not complying and hoping for the best. He explicitly says so right in the article.

That is never the right approach. You intentionally introduce a problem you expect others to fix. All because he doesn't like 0x0d. The protocol is what it is. If you want to make more sane decisions when designing a new protocol (or an explicitly newer version of some existing one) then by all means, go for it. But intentionally breaking existing ones is not the way to go.

101. shadowgovt ◴[] No.41837585{9}[source]
And this whole exercise is an example of why this is a non-starter proposal (at least the "change existing implementations" part).

How much do we expect the domain owners to invest in changing an implementation that already works? Hint: it's a number smaller than epsilon.

Google might, but their volume is so high they care about the cost of individual bytes on the wire.

replies(1): >>41838741 #
102. tptacek ◴[] No.41838741{10}[source]
This exercise was about demonstrating that our security can't rely on making sure there's a carriage return in HTTP line termination, because there is no such norm. See the root of the thread, where I asked the question.
replies(1): >>41839176 #
103. shadowgovt ◴[] No.41839176{11}[source]
Oh, I agree it's about that too, but my point is you've already volunteered more time and resources investigating the situation than most companies would be willing to spend.
104. deanishe ◴[] No.41839309[source]
> but this is giving me pause about using other software by the same author

Go read the article again. I think you'll be pleasantly surprised.

105. Smar ◴[] No.41842146{6}[source]
Only in the case the reverse proxy does not handle bare 0a newlines?
106. Aeolun ◴[] No.41843394{6}[source]
Thanks, that was actually a very clear description of the problem!

The problem here is not to use one or the other, but to use a mix of both.

replies(1): >>41848471 #
107. inopinatus ◴[] No.41843876{7}[source]
Yes, CRLF is dumb. Trying to justify the problem seems unnecessary, it's widely acknowledged. A productive inquiry looks at why fixing it didn't happen yet. Don't confuse that line of thought for calling for more failure.

This is unrealistic, though:

> I don't believe in Postel's Law

All the systems around us that work properly do believe in it, and they will continue to do so. No-one who writes MTAs or reverse proxies &c is gonna listen to the wolves howling at the moon for change when there's no better plan that "ram it through unilaterally". Irrespective of what any individual may believe, Postel's Law remains axiomatic in protocol design & implementation.

More constructively, it may be that line-oriented protocols will only move towards change when they can explicitly negotiate line termination preferences during the opening handshake/banner/key exchange etc, which inevitably means a protocol revision in every case and very careful consideration of when CRLF is passed through anyway (e.g. email body).

replies(1): >>41855135 #
108. Spooky23 ◴[] No.41844264{4}[source]
The author of the post specifically addressed this:

“Today, CR is represented by U+000d and both LF and NL are represented by U+000a. Almost all modern machines use U+000a to mean NL exclusively. That meaning is embedded in most programming languages as the backslash escape \n. Nevertheless, a minority of machines still insist on sending a CR together with their NLs”

Who is the “minority”?

He also takes the position that the legacy behavior is fine for a tty, as it’s emulating a legacy terminal.

109. immibis ◴[] No.41848471{7}[source]
And the standard is CRLF, so you're either following the standard or using a mix.
110. romwell ◴[] No.41854139{5}[source]
Note that I said that it would make sense a CRLF command too, as in: in addition to separate CR and LF commands, which are useful in their own rights.

I also strongly disagree with the author that LF is useless.

So many times in code I need to type:

    Function blah(parameter1 = default1, 
                  parameter2, ...)

It would be super nice to move down from the beginning of the word "parameter1" down to the next line even when it's empty to start typing at that position.

Sure, there is auto format. But not in this comment box.

And what I'm talking about is exactly what LF was meant to do!

I want all my text boxes to support that, and to have a special key on my keyboard to do it.

Eh.

111. tptacek ◴[] No.41855135{8}[source]
Hold on: if you do believe in Postel's Law, you agree with me: just send newlines.
replies(1): >>41857490 #
112. ◴[] No.41857490{9}[source]