←back to thread

348 points dgl | 3 comments | | HN request time: 0s | source
Show context
HappMacDonald ◴[] No.44503204[source]
I completely disagree with author's (oft quoted here in comments) statement:

> I find this particularly interesting because this isn't fundamentally a problem of the software being written in C. These are logic errors that are possible in nearly all languages

For Christ's sake, Turing taught us that any error in one language is possible in any other language. You can even get double free in Rust if you take the time to build an entire machine emulator and then run something that uses Malloc in the ensuing VM. Rust and similar memory safe languages can emulate literally any problem C can make a mine field out of.. but logic errors being "possible" to perform are significantly different from logic errors being the first tool available to pull out of one's toolbox.

Other comments have cited that in non-C languages a person would be more likely to reach for a security-hardened library first, which I agree might be helpful.. but replies to those comments also correctly point out that this trades one problem for another with dependency hell, and I would add on top of that the issue that a widely relied upon library can also increase the surface area of attack when a novel exploit gets found in it. Libraries can be a very powerful tool but neither are they a panacea.

I would argue that the real value in a more data-safe language (be that Rust or Haskell or LISP et al) is in offering the built-in abstractions which lend themselves to more carefully modeling data than as a firehose of octets which a person then assumes they need to state-switch over like some kind of raw Turing machine.

"Parse, don't validate" is a lot easier to stick to when you're coding in a language designed with a precept like that in mind vs a language designed to be only slightly more abstract than machine code where one can merely be grateful that they aren't forced to use jump instructions for every control flow action.

replies(5): >>44503421 #>>44503690 #>>44503814 #>>44504757 #>>44507075 #
lilyball ◴[] No.44503421[source]
I can easily see this bug happening in Rust. At some level you need to transform your data model into text to write out, and to parse incoming text. If you want to parse linewise you might use BufRead::lines(), and then write a parser for those lines. That parser won't touch CRs at all, which means when you do the opposite and write the code that serializes your data model back to lines, it's easy to forget that you need to avoid having a trailing CR, since CR appears nowhere else in your code.
replies(2): >>44503634 #>>44504622 #
1. HappMacDonald ◴[] No.44503634[source]
Well the question then becomes "how do you identify the quoting that needs to happen on the line" and tactics common in Rust enabled by features available in Rust will still lead a person away from this pattern of error.

One tool I'd have probably reached for (long before having heard of this particular corner case to avoid) would have been whitespace trimming, and CR counts as whitespace. Plus folk outside of C are also more likely to aim a regex at a line they want to parse, and anyone who's been writing regex for more than 5 minutes gets into the habit of adding `\s*` adjacent to beginning of line and end of line markers (and outside of capture groups) which in this case achieves the same end.

replies(2): >>44504471 #>>44504916 #
2. wizzwizz4 ◴[] No.44504471[source]
I've been writing regular expressions for at least 8 years, and I'm not sure I've ever written `\s*`.
3. lilyball ◴[] No.44504916[source]
You're describing a different format entirely then if you're doing generic whitespace trimming without any consideration for the definition of "whitespace". The Git config format explicitly defines ignorable whitespace as spaces and horizontal tabs, and says that these whitespace characters are trimmed from values, which means nothing else gets trimmed from values. If you try to write a parser for this using a regular expression and `\s*` then you'd better look up what `\s` means to your regex engine because it almost certainly includes more than just SP and HT.

I can't think of any features in Rust that will lead someone away from this pattern of error, where this pattern of error is not realizing that round-tripping the serialized output back through the deserializer can change the boundaries of line endings. It's really easy to think "if I have a bunch of single-line strings and I join them with newlines I now have multiline text, and I can split that back up into individual lines and get back what I started with". This is doubly true if you start with a parser that splits on newline characters and then change it after the fact to use BufRead::lines() in response to someone telling you it doesn't work on Windows.