←back to thread

348 points dgl | 1 comments | | HN request time: 0s | source
Show context
HappMacDonald ◴[] No.44503204[source]
I completely disagree with author's (oft quoted here in comments) statement:

> I find this particularly interesting because this isn't fundamentally a problem of the software being written in C. These are logic errors that are possible in nearly all languages

For Christ's sake, Turing taught us that any error in one language is possible in any other language. You can even get double free in Rust if you take the time to build an entire machine emulator and then run something that uses Malloc in the ensuing VM. Rust and similar memory safe languages can emulate literally any problem C can make a mine field out of.. but logic errors being "possible" to perform are significantly different from logic errors being the first tool available to pull out of one's toolbox.

Other comments have cited that in non-C languages a person would be more likely to reach for a security-hardened library first, which I agree might be helpful.. but replies to those comments also correctly point out that this trades one problem for another with dependency hell, and I would add on top of that the issue that a widely relied upon library can also increase the surface area of attack when a novel exploit gets found in it. Libraries can be a very powerful tool but neither are they a panacea.

I would argue that the real value in a more data-safe language (be that Rust or Haskell or LISP et al) is in offering the built-in abstractions which lend themselves to more carefully modeling data than as a firehose of octets which a person then assumes they need to state-switch over like some kind of raw Turing machine.

"Parse, don't validate" is a lot easier to stick to when you're coding in a language designed with a precept like that in mind vs a language designed to be only slightly more abstract than machine code where one can merely be grateful that they aren't forced to use jump instructions for every control flow action.

replies(5): >>44503421 #>>44503690 #>>44503814 #>>44504757 #>>44507075 #
lilyball ◴[] No.44503421[source]
I can easily see this bug happening in Rust. At some level you need to transform your data model into text to write out, and to parse incoming text. If you want to parse linewise you might use BufRead::lines(), and then write a parser for those lines. That parser won't touch CRs at all, which means when you do the opposite and write the code that serializes your data model back to lines, it's easy to forget that you need to avoid having a trailing CR, since CR appears nowhere else in your code.
replies(2): >>44503634 #>>44504622 #
1. tetha ◴[] No.44504622[source]
And - having dealt with parser construction in university for a few months - the only real way to deal with this is fuzzing and round trip tests.

It sounds defeatist, but non-trivial parsers end up with a huge state space very quickly - and entirely strange error situations and problematic inputs. And "non-trivial" starts a lot sooner than one would assume. As the article shows, even "one element per line" ends up non-trivial once you support two platforms. "foo\r\n" could be tokenized/parsed in 3 or even 4 different ways or so.

It just becomes worse from there. And then Unicode happened.