Show HN: Hatchet – Open-source distributed task queue

1. bluehadoop ◴[08 Mar 24 18:08 UTC] No.39643927[source]▶

How does this compare against Temporal/Cadence/Conductor? Does hatchet also support durable execution?

https://temporal.io/ https://cadenceworkflow.io/ https://conductor-oss.org/

2. abelanger ◴[08 Mar 24 18:50 UTC] No.39644550[source]▶

It's very similar - I used Temporal at a previous company to run a couple million workflows per month. The gRPC networking with workers is the most similar component, I especially liked that I only had to worry about an http2 connection with mTLS instead of a different broker protocol.

Temporal is a powerful system but we were getting to the point where it took a full-time engineer to build an observability layer around Temporal. Integrating workflows in an intuitive way with OpenTelemetry and logging was surprisingly non-arbitrary. We wanted to build more of a Vercel-like experience for managing workflows.

We have a section on the docs page for durable execution [1], also see the comment on HN [2]. Like I mention in that comment, we still have a long way to go before users can write a full workflow in code in the same style as a Temporal workflow, users either define the execution path ahead of time or invoke a child workflow from an existing workflow. This is also something that requires customization for each SDK - like Temporal's custom asyncio event loop in their Python SDK [3]. We don't want to roll this out until we can be sure about compatibility with the way most people write their functions.

[1] https://docs.hatchet.run/home/features/durable-execution

[2] https://news.ycombinator.com/item?id=39643881

[3] https://github.com/temporalio/sdk-python

replies(2): >>39646064 #>>39651696 #

3. bicijay ◴[08 Mar 24 20:33 UTC] No.39646064[source]▶

>>39644550 #

Well, you just got an user. Love the concept of temporal, but i can't justify the overhead you need with infra to make it work for the upper guys... And the cloud offering is a bit expensive for small companies.

replies(1): >>39647745 #

4. mfateev ◴[08 Mar 24 23:23 UTC] No.39647745{3}[source]▶

>>39646064 #

Do you know about the Temporal startup program? It gives enough credits to offset support fees for 2 years. https://temporal.io/startup

replies(2): >>39649420 #>>39658844 #

5. Aeolun ◴[09 Mar 24 04:39 UTC] No.39649420{4}[source]▶

>>39647745 #

If you are expecting to still be small after 2 years that just delays the expense until you are locked in?

6. dangoodmanUT ◴[09 Mar 24 14:17 UTC] No.39651696[source]▶

>>39644550 #

> we were getting to the point where it took a full-time engineer to build an observability layer around Temporal

We did it in like 5 minutes by adding in otel traces? And maybe another 15 to add their grafana dashboard?

What obstacles did you experience here?

replies(1): >>39653865 #

7. abelanger ◴[09 Mar 24 18:52 UTC] No.39653865{3}[source]▶

>>39651696 #

Well, for one - most otel services (like Honeycomb) are designed around aggregate views, and engineers found it difficult to track down the failure of specific workflows. We were already using Sentry, had started adding prom + grafana into our stack, and were already using mezmo for logging. So to debug a workflow, we'd see an alert come in through Sentry, grab the workflow ID and activity ID, perform a search in the Temporal console, track down the failed activity (of which there could be between 1-100 activities), and associate that with our logs in mezmo (involving a new query syntax). This is a lot of raw data that takes time to parse and figure out what's going wrong. And then we wanted to build out a view of worker health, which involves a new set of dashboards and alerts that are different from our error alerting in Sentry.

Yes, this sounded broken to us too - we were aware of the promise of consolidation with an opentelemetry and a Grafana stack, but we couldn't make this transition happen cleanly, and when you're already relying on certain tools for your API it makes the transition more difficult. There's also upskilling involved in getting engineers on the team to adjust to otel when they're used to more intuitive tools like sentry and mezmo.

A good set of default metrics, better search, and views for worker performance and pools - that would have gone a long way. The extent of Temporal UI features are basic recent workflows, an expanded workflow view with stack traces for thrown errors, a schedules page, and a settings page.

8. bicijay ◴[10 Mar 24 13:22 UTC] No.39658844{4}[source]▶

>>39647745 #

I know its gonna sound entitled. But even though we are a small company we still process a lot of events from third parties. Temporal cloud pricing is based on number of actions, 2400 bucks would only cover some months in our case.