←back to thread

578 points abelanger | 3 comments | | HN request time: 0.647s | source

Hello HN, we're Gabe and Alexander from Hatchet (https://hatchet.run), we're working on an open-source, distributed task queue. It's an alternative to tools like Celery for Python and BullMQ for Node.js, primarily focused on reliability and observability. It uses Postgres for the underlying queue.

Why build another managed queue? We wanted to build something with the benefits of full transactional enqueueing - particularly for dependent, DAG-style execution - and felt strongly that Postgres solves for 99.9% of queueing use-cases better than most alternatives (Celery uses Redis or RabbitMQ as a broker, BullMQ uses Redis). Since the introduction of SKIP LOCKED and the milestones of recent PG releases (like active-active replication), it's becoming more feasible to horizontally scale Postgres across multiple regions and vertically scale to 10k TPS or more. Many queues (like BullMQ) are built on Redis and data loss can occur when suffering OOM if you're not careful, and using PG helps avoid an entire class of problems.

We also wanted something that was significantly easier to use and debug for application developers. A lot of times the burden of building task observability falls on the infra/platform team (for example, asking the infra team to build a Grafana view for their tasks based on exported prom metrics). We're building this type of observability directly into Hatchet.

What do we mean by "distributed"? You can run workers (the instances which run tasks) across multiple VMs, clusters and regions - they are remotely invoked via a long-lived gRPC connection with the Hatchet queue. We've attempted to optimize our latency to get our task start times down to 25-50ms and much more optimization is on the roadmap.

We also support a number of extra features that you'd expect, like retries, timeouts, cron schedules, dependent tasks. A few things we're currently working on - we use RabbitMQ (confusing, yes) for pub/sub between engine components and would prefer to just use Postgres, but didn't want to spend additional time on the exchange logic until we built a stable underlying queue. We are also considering the use of NATS for engine-engine and engine-worker connections.

We'd greatly appreciate any feedback you have and hope you get the chance to try out Hatchet.

1. rubenfiszel ◴[] No.39651231[source]
Ola, fellow YC founders. Surely you have seen Windmill since you refer to it in the comments below. It looks like Hatchet, being a lot more recent, has currently a subset of what Windmill offers, albeit with a focus solely on the task queue and without the self-hosted enterprise focus. So it looks more like a competitor to Inngest than of Windmill. We released workflows as code last week which was the primary differentiator with other workflow engines and us so far: https://www.windmill.dev/docs/core_concepts/workflows_as_cod...

The license is more permissive than ours MIT vs AGPLv3, and you're using Go vs Rust for us, but other than that the architecture looks extremely similar, also based mostly on Postgres with the same insights than us: it's sufficient. I'm curious where do you see the main differentiator long-term.

replies(1): >>39651748 #
2. HoyaSaxa ◴[] No.39651748[source]
No connection to either company, but for what it’s worth I’d never in a million years consider Windmill and this product to be direct competitors.

We’ve had a lot of pain with celery and Redis over the years and Hatchet seems to be a pretty compelling alternative. I’d want to see the codebase stabilize a bit before seriously considering it though. And frankly I don’t see a viable path to real commercialization for them so I’d only consider it if everything you needed really was MIT licensed.

Windmill is super interesting but I view it as the next evolution of something like Zapier. Having a large corpus of templates and integrations is the power of that type of product. I understand that under the hood it is a similar paradigm, but the market positioning is rightfully night and day. And I also do see a path to real commercialization of the Windmill product because of the above.

replies(1): >>39651892 #
3. rubenfiszel ◴[] No.39651892[source]
Windmill is used by large enterprises to run critical jobs that require a predefined amount of resources and can run for months if needed, stream their logs, written in code at scale with upmost reliability, throughput and lowest overhead. The only insight from Zapier is how easy it is to develop new workflows.

I understand our positioning is not clear on our landing (and we are working on it), but my read of hatched is that what they put forward is mostly a durable execution engine for arbitrary code in python/typescript on a fleet of managed workers, which is exactly what Windmill is. We are profitable and probably wouldn't if we were MIT licensed with no enterprise features.

From reading their documentation, the implementation is extremely similar, you define workflows as code ahead of time, and then the engine make sure to have them progress reliably on your fleet of workers (one of our customer has 600 workers deployed on edge environments). There are a few minor differences, we implement the workers as generic rust binary that pull the workflows, so you never have to redeploy them to test and deploy new workflows, whereas they have developed SDK for each languages to allow you to define your own deployable workers (which is more similar to Inngest/Temporal). Also we use polling and REST instead of gRPC for communications between workers and servers.