(github.com)

578 points abelanger | 3 comments | 08 Mar 24 17:07 UTC | HN request time: 0.761s | source

Hello HN, we're Gabe and Alexander from Hatchet (https://hatchet.run), we're working on an open-source, distributed task queue. It's an alternative to tools like Celery for Python and BullMQ for Node.js, primarily focused on reliability and observability. It uses Postgres for the underlying queue.

Why build another managed queue? We wanted to build something with the benefits of full transactional enqueueing - particularly for dependent, DAG-style execution - and felt strongly that Postgres solves for 99.9% of queueing use-cases better than most alternatives (Celery uses Redis or RabbitMQ as a broker, BullMQ uses Redis). Since the introduction of SKIP LOCKED and the milestones of recent PG releases (like active-active replication), it's becoming more feasible to horizontally scale Postgres across multiple regions and vertically scale to 10k TPS or more. Many queues (like BullMQ) are built on Redis and data loss can occur when suffering OOM if you're not careful, and using PG helps avoid an entire class of problems.

We also wanted something that was significantly easier to use and debug for application developers. A lot of times the burden of building task observability falls on the infra/platform team (for example, asking the infra team to build a Grafana view for their tasks based on exported prom metrics). We're building this type of observability directly into Hatchet.

What do we mean by "distributed"? You can run workers (the instances which run tasks) across multiple VMs, clusters and regions - they are remotely invoked via a long-lived gRPC connection with the Hatchet queue. We've attempted to optimize our latency to get our task start times down to 25-50ms and much more optimization is on the roadmap.

We also support a number of extra features that you'd expect, like retries, timeouts, cron schedules, dependent tasks. A few things we're currently working on - we use RabbitMQ (confusing, yes) for pub/sub between engine components and would prefer to just use Postgres, but didn't want to spend additional time on the exchange logic until we built a stable underlying queue. We are also considering the use of NATS for engine-engine and engine-worker connections.

We'd greatly appreciate any feedback you have and hope you get the chance to try out Hatchet.

Show context

kcorbitt ◴[08 Mar 24 18:13 UTC] No.39643991[source]▶

>>39643136 (OP) #

I love your vision and am excited to see the execution! I've been looking for exactly this product (postgres-backed task queue with workers in multiple languages and decent built-in observability) for like... 3 years. Every 6 months I'll check in and see if someone has built it yet, evaluate the alternatives, and come away disappointed.

One important feature request that probably would block our adoption: one reason why I prefer a postgres-backed queue over eg. Redis is just to simplify our infra by having fewer servers and technologies in the stack. Adding in RabbitMQ is definitely an extra dependency I'd really like to avoid.

(Currently we've settled on graphile-worker which is fine for what it does, but leaves a lot of boxes unchecked.)

replies(9): >>39644137 #>>39645512 #>>39646111 #>>39647059 #>>39647179 #>>39650750 #>>39651174 #>>39652574 #>>39652765 #

ako ◴[08 Mar 24 22:03 UTC] No.39647059[source]▶

>>39643991 #

Funny how this is vision now. I started my career 29 years ago at a company that build exactly this, but based on oracle. The agents would run on Solaris, aix, vax vms, hpux, windows nt, iris, etc. Was also used to create an automated cicd pipeline to build all binaries on all these different systems.

replies(2): >>39650305 #>>39651858 #

throwawaymaths ◴[09 Mar 24 08:22 UTC] No.39650305[source]▶

>>39647059 #

Also basically has existed as an open source (pro version has web dashboard and complex task zoo) drop-in library (no sidecar dependencies outside of postgres) in Elixir for years called Oban.

replies(1): >>39650905 #

1. cpursley ◴[09 Mar 24 11:02 UTC] No.39650905[source]▶

>>39650305 #

Yep, it feels like half the show hn launches is for infrastructure tooling that already exist natively or as plug and play libraries for Elixir/Erlang.

I really try to suggest people skip Node and learn a proper backend language with a solid framework with a proven architecture.

replies(1): >>39651364 #

2. zepolen ◴[09 Mar 24 13:05 UTC] No.39651364[source]▶

>>39650905 (TP) #

Oban looks great, how would one run a python cuda based workload on it?

replies(1): >>39651529 #

3. hosh ◴[09 Mar 24 13:44 UTC] No.39651529[source]▶

>>39651364 #

You could shell out to execute with porcelain, make the python a long-running process and use ports, or port your python code to NX.

↑

Show HN: Hatchet – Open-source distributed task queue