Building a Durable Execution Engine with SQLite

(www.morling.dev)

110 points ingve | 2 comments | 20 Nov 25 13:26 UTC | HN request time: 0.413s | source

Show context

roughly ◴[21 Nov 25 21:04 UTC] No.46008976[source]▶

One thing that needs to be emphasized with “durable execution” engines is they don’t actually get you out of having to handle errors, rollbacks, etc. Even the canonical examples everyone uses - so you’re using a DE engine to restart a sales transaction, but the part of that transaction that failed was “charging the customer” - did it fail before or after the charge went through? You failed while updating the inventory system - did the product get marked out or not? All of these problems are tractable, but once you’ve solved them - once you’ve built sufficient atomicity into your system to handle the actual failure cases - the benefits of taking on the complexity of a DE system are substantially lower than the marketing pitch.

replies(3): >>46009362 #>>46009374 #>>46009633 #

hedgehog ◴[21 Nov 25 21:46 UTC] No.46009362[source]▶

>>46008976 #

In my one encounter with one of these systems it induced new code and tooling complexity, orders of magnitude performance overhead for most operations, and made dev and debug workflows much slower. All for... an occasional convenience far outweighed by the overall drag of using it. There are probably other environments where something like this makes sense but I can't figure out what they are.

replies(2): >>46009456 #>>46009666 #

1. jedberg ◴[21 Nov 25 22:17 UTC] No.46009666[source]▶

>>46009362 #

I'm not sure which one you used, but ideally it's so lightweight that the benefits outweigh the slight cost of developing with them. Besides the recovery benefit, there is observability and debugging benefits too.

replies(1): >>46011100 #

2. hedgehog ◴[22 Nov 25 01:12 UTC] No.46011100[source]▶

>>46009666 (TP) #

I don't want to start a debate about a specific vendor but the cost was very high. Leaky serialization of call arguments and results, then hairpinning messages across the internet and back to get to workers. 200ms overhead for a no-op call. There was some observability benefit but it didn't allow for debugger access and had its own special way of packaging code so net add of complexity there too. That's not getting into the induced complexity caused by adding a bunch of RPC boundaries to fit their execution model. All that and using the thing effectively still requires understanding their runtime model. I understand the motivation, but not the technical approach.

↑