←back to thread

261 points david927 | 1 comments | | HN request time: 0.216s | source

What are you working on? Any new ideas that you're thinking about?
Show context
maxbond ◴[] No.43156306[source]
A parser combinator library. I'm writing a tool that will do static analysis of SQL (in a very limited fashion, it's a build tool and not a static analyzer, but I need to understand dependency relationships between statements). I started out using `nom`, but found it imperfectly matched to my needs (underpowered in areas I desired and overpowered in areas I didn't need for my project). `nom 8` came out with some interesting simplifications, but it happened to break my code in a way that would be awkward to fix. So I bit the bullet and started writing my own library.

My library is specialized for parsing text. That had enabled some cool capabilities.

It comes with a `Span` primitive, which tracks where in a file a token came from, for implementing error messages. A `Span` can be the input or the output of a parser. At the front end a `Span` is an entire file, and as you slice and dice it, it tracks the metadata of where it came from.

Along with the standard `Sequence` (combining parsers in a set order) and `Choice` operations (branching between many parsers) that parser combinators are built around, I have come up two operations that are very handy. I suspect that others have made them before, they are both patterns I used in `nom`. (I've also only skimmed the original paper, they could be in there and I didn't see them.)

One of them is called `Compose`. As an alternative to a `Sequence`, instead of a group of parsers consuming the input in order, the first parser consumes the input, and the subsequent parsers consume the return of the previous parser. This is useful for instance when implementing escapable strings; the first parser grabs the entire string, the second one transforms escape sequences. (There is a mechanism for transforming the content of a `Span` while retaining it's metadata.)

The other is called `Fuse`. This is a small twist on `Sequence`, where after matching the parsers in order, the result is all concatenated together into a single token. This is useful for a "pattern matching" primitive, where you want to find a series of tokens in order, but you don't want to split them into different tokens, you want them all together.

It's been a wild ride, there's been a lot of thorny issues. I often think I should've just stuck with `nom 7` instead of shaving this yak. But I've learned a whole lot about writing especially abstract/DSL-yy Rust by combining tuples, traits, and declarative macros. There are also other programming language projects I'd like to pursue, and it will be nice to have a tailor fit tool for parsing text.

Special thanks to dtolnay's `paste,` the real MVP.

replies(2): >>43156533 #>>43156886 #
1. mirekrusin ◴[] No.43156886[source]
I did something like this some time ago [0] and still using it in prod - feel free to get inspiration in case you find it useful.

[0] https://github.com/preludejs/parser