Most active commenters

    ←back to thread

    348 points dgl | 19 comments | | HN request time: 1.127s | source | bottom
    1. HappMacDonald ◴[] No.44503204[source]
    I completely disagree with author's (oft quoted here in comments) statement:

    > I find this particularly interesting because this isn't fundamentally a problem of the software being written in C. These are logic errors that are possible in nearly all languages

    For Christ's sake, Turing taught us that any error in one language is possible in any other language. You can even get double free in Rust if you take the time to build an entire machine emulator and then run something that uses Malloc in the ensuing VM. Rust and similar memory safe languages can emulate literally any problem C can make a mine field out of.. but logic errors being "possible" to perform are significantly different from logic errors being the first tool available to pull out of one's toolbox.

    Other comments have cited that in non-C languages a person would be more likely to reach for a security-hardened library first, which I agree might be helpful.. but replies to those comments also correctly point out that this trades one problem for another with dependency hell, and I would add on top of that the issue that a widely relied upon library can also increase the surface area of attack when a novel exploit gets found in it. Libraries can be a very powerful tool but neither are they a panacea.

    I would argue that the real value in a more data-safe language (be that Rust or Haskell or LISP et al) is in offering the built-in abstractions which lend themselves to more carefully modeling data than as a firehose of octets which a person then assumes they need to state-switch over like some kind of raw Turing machine.

    "Parse, don't validate" is a lot easier to stick to when you're coding in a language designed with a precept like that in mind vs a language designed to be only slightly more abstract than machine code where one can merely be grateful that they aren't forced to use jump instructions for every control flow action.

    replies(5): >>44503421 #>>44503690 #>>44503814 #>>44504757 #>>44507075 #
    2. lilyball ◴[] No.44503421[source]
    I can easily see this bug happening in Rust. At some level you need to transform your data model into text to write out, and to parse incoming text. If you want to parse linewise you might use BufRead::lines(), and then write a parser for those lines. That parser won't touch CRs at all, which means when you do the opposite and write the code that serializes your data model back to lines, it's easy to forget that you need to avoid having a trailing CR, since CR appears nowhere else in your code.
    replies(2): >>44503634 #>>44504622 #
    3. HappMacDonald ◴[] No.44503634[source]
    Well the question then becomes "how do you identify the quoting that needs to happen on the line" and tactics common in Rust enabled by features available in Rust will still lead a person away from this pattern of error.

    One tool I'd have probably reached for (long before having heard of this particular corner case to avoid) would have been whitespace trimming, and CR counts as whitespace. Plus folk outside of C are also more likely to aim a regex at a line they want to parse, and anyone who's been writing regex for more than 5 minutes gets into the habit of adding `\s*` adjacent to beginning of line and end of line markers (and outside of capture groups) which in this case achieves the same end.

    replies(2): >>44504471 #>>44504916 #
    4. SpaceNugget ◴[] No.44503690[source]
    > You can even get double free in Rust if you take the time to build an entire machine emulator and then run something that uses Malloc in the ensuing VM. Rust and similar memory safe languages can emulate literally any problem C can make a mine field out of..

    That doesn't have any relevance to a discussion about memory safety in C vs rust. Invalid memory access in the emulated machine won't be able to access memory from other processes on the host system. Two languages being turing complete does not make them the same language. And it definitely does not make them bug for bug compatible. Rust _really_ does enable you to write memory safe programs.

    replies(2): >>44503975 #>>44511220 #
    5. markasoftware ◴[] No.44503814[source]
    As you point out, the most serious way to undermine the "safety" features in a "safe" language like Rust is to implement a VM, programming language, serdes framework, etc, because these operate outside of Rust's type system and memory safety.

    And that's exactly what the Git developers did here: They made an in-house configuration file format. If implemented in Rust, it would bypass most of Rust's safety features, particularly, type-safety.

    replies(1): >>44504068 #
    6. prmph ◴[] No.44503975[source]
    Sounds like you actually agree with the comment you are replying to.
    7. nixosbestos ◴[] No.44504068[source]
    It is mind-blowing the things people come up with when it comes to Rust vs C conversations. The same colvoluted crap for years at this point.

    No, just no. I'm sorry, Ive implemented countless custom formats in Rust and have NEVER had to side step safe/unsafe or otherwise sacrifice type safety. Just what an absurd claim.

    Maybe for some binary (de)serialization you get fancy (lol and are still likely to be far better off than with C) but goodness, I cannot imagine a single reason why a config file parser would need to be (type)-unsafe.

    replies(1): >>44504364 #
    8. sophacles ◴[] No.44504364{3}[source]
    The person you replied to didn't say that you had to bypass safe. This bug is orthogonal to type and memory safety, its a different issue.

    The git bug in question could be written in 100% safe rust using as much or as little of the type system[1] as you want. It's a logic error when parsing a string.

    I dev rust full-time, and I've spent a lot of time writing protocol parsers. It's easy to forget to check this or that byte/string for every possible edge case as you're parsing it into some rust type, and happens all the time in rust, just like it did in C or python or go when I used those languages. This bug (if anything) is the type of thing that is solved with good tokenizer design and testing, and using more small, independently tested functions - again not at all related to the type system.

    [1] Although in rust you can arrange your types so that this sort of bug is harder to implement or easier to catch than in most languages... but doing that requires an up-front understanding that logic bugs are just as possible in rust as in other languages, as well as some experience to avoid awkwardness when setting the types up.

    replies(1): >>44504771 #
    9. wizzwizz4 ◴[] No.44504471{3}[source]
    I've been writing regular expressions for at least 8 years, and I'm not sure I've ever written `\s*`.
    10. tetha ◴[] No.44504622[source]
    And - having dealt with parser construction in university for a few months - the only real way to deal with this is fuzzing and round trip tests.

    It sounds defeatist, but non-trivial parsers end up with a huge state space very quickly - and entirely strange error situations and problematic inputs. And "non-trivial" starts a lot sooner than one would assume. As the article shows, even "one element per line" ends up non-trivial once you support two platforms. "foo\r\n" could be tokenized/parsed in 3 or even 4 different ways or so.

    It just becomes worse from there. And then Unicode happened.

    11. umanwizard ◴[] No.44504757[source]
    > You can even get double free in Rust if you take the time to build an entire machine emulator and then run something that uses Malloc in the ensuing VM.

    No, this wouldn't be a double free in Rust, it'd be a double free in whatever language you used to write the emulated code.

    The distinction is meaningful, because the logic error he's talking about is possible in actual rust (even without unsafe), not just theoretically in some virtual system that you can use Rust to write a simulation for.

    replies(1): >>44505341 #
    12. the8472 ◴[] No.44504771{4}[source]
    In practice I think a Rust project would have used toml which parses safely. The limitation there would be that toml requires strings to be utf8, so it couldn't represent all possible unix paths.
    replies(1): >>44510730 #
    13. lilyball ◴[] No.44504916{3}[source]
    You're describing a different format entirely then if you're doing generic whitespace trimming without any consideration for the definition of "whitespace". The Git config format explicitly defines ignorable whitespace as spaces and horizontal tabs, and says that these whitespace characters are trimmed from values, which means nothing else gets trimmed from values. If you try to write a parser for this using a regular expression and `\s*` then you'd better look up what `\s` means to your regex engine because it almost certainly includes more than just SP and HT.

    I can't think of any features in Rust that will lead someone away from this pattern of error, where this pattern of error is not realizing that round-tripping the serialized output back through the deserializer can change the boundaries of line endings. It's really easy to think "if I have a bunch of single-line strings and I join them with newlines I now have multiline text, and I can split that back up into individual lines and get back what I started with". This is doubly true if you start with a parser that splits on newline characters and then change it after the fact to use BufRead::lines() in response to someone telling you it doesn't work on Windows.

    14. charcircuit ◴[] No.44505341[source]
    Another example would be making your own allocator in Rust.
    replies(1): >>44505513 #
    15. umanwizard ◴[] No.44505513{3}[source]
    Not possible without unsafe.
    replies(1): >>44507208 #
    16. lelanthran ◴[] No.44507075[source]
    > "Parse, don't validate" is a lot easier to stick to when you're coding in a language designed with a precept like that in mind vs a language designed to be only slightly more abstract than machine code

    "Parse, don't validate" is easily doable in plain C and almost always has been. See https://www.lelanthran.com/chap13/content.html

    17. charcircuit ◴[] No.44507208{4}[source]
    Sure it is. Have an array which you allocate from.
    18. hnaccount_rng ◴[] No.44510730{5}[source]
    Which kind of makes it an unsuitable solution for the given problem right? Git is not free to (or at least doesn't consider itself free to) work only on a subset of possible paths.
    19. 1718627440 ◴[] No.44511220[source]
    Invalid memory access in C also won't be able to access memory from other processes (on a modern computer, outside the OS).