←back to thread

110 points ingve | 5 comments | | HN request time: 1.019s | source
Show context
roughly ◴[] No.46008976[source]
One thing that needs to be emphasized with “durable execution” engines is they don’t actually get you out of having to handle errors, rollbacks, etc. Even the canonical examples everyone uses - so you’re using a DE engine to restart a sales transaction, but the part of that transaction that failed was “charging the customer” - did it fail before or after the charge went through? You failed while updating the inventory system - did the product get marked out or not? All of these problems are tractable, but once you’ve solved them - once you’ve built sufficient atomicity into your system to handle the actual failure cases - the benefits of taking on the complexity of a DE system are substantially lower than the marketing pitch.
replies(3): >>46009362 #>>46009374 #>>46009633 #
1. jedberg ◴[] No.46009633[source]
The key to a durable workflow is making each step idempotent. Then you don't have to worry about those things. You just run the failed step again. If it already worked the first time, it's a no-op.

For example, stripe lets you include an idempotency key with your request. If you try to make a charge again with the same key, it ignores you. A DE framework like DBOS will automatically generate the idempotency key for you.

But you're correct, if you can't make the operation idempotent, then you have to handle that yourself.

replies(1): >>46009942 #
2. repeekad ◴[] No.46009942[source]
Temporal plus idempotency keys solves probably the majority of infrastructure normally needed for production systems
replies(1): >>46010375 #
3. cyberpunk ◴[] No.46010375[source]
Except to run temporal at scale on prem you’ll need 50x the infra you had before.
replies(1): >>46010599 #
4. jedberg ◴[] No.46010599{3}[source]
Indeed, one of the main selling points of DBOS. All the functionality of Temporal without any of the infrastructure.
replies(1): >>46010644 #
5. cyberpunk ◴[] No.46010644{4}[source]
Ah I don't know if I would agree with that. Temporal does a lot of stuff; we just don't happen to need most of it and it's really heavyweight on the database side (running low 500 or so workflows/second of their own 'hello world' style echo benchmark translates to 100k database ops/second..

DBOS is tied to Postgres, right? That wouldn't scale anywhere near where we need either.

Sadly there aren't many shortcuts in this space and pretending there are seems a bit hip at the moment. In the end, mostly everyone who can afford to solve such problems are gonna end up writing their own systems for this.