Most active commenters
  • leetrout(4)

←back to thread

578 points abelanger | 11 comments | | HN request time: 1.829s | source | bottom

Hello HN, we're Gabe and Alexander from Hatchet (https://hatchet.run), we're working on an open-source, distributed task queue. It's an alternative to tools like Celery for Python and BullMQ for Node.js, primarily focused on reliability and observability. It uses Postgres for the underlying queue.

Why build another managed queue? We wanted to build something with the benefits of full transactional enqueueing - particularly for dependent, DAG-style execution - and felt strongly that Postgres solves for 99.9% of queueing use-cases better than most alternatives (Celery uses Redis or RabbitMQ as a broker, BullMQ uses Redis). Since the introduction of SKIP LOCKED and the milestones of recent PG releases (like active-active replication), it's becoming more feasible to horizontally scale Postgres across multiple regions and vertically scale to 10k TPS or more. Many queues (like BullMQ) are built on Redis and data loss can occur when suffering OOM if you're not careful, and using PG helps avoid an entire class of problems.

We also wanted something that was significantly easier to use and debug for application developers. A lot of times the burden of building task observability falls on the infra/platform team (for example, asking the infra team to build a Grafana view for their tasks based on exported prom metrics). We're building this type of observability directly into Hatchet.

What do we mean by "distributed"? You can run workers (the instances which run tasks) across multiple VMs, clusters and regions - they are remotely invoked via a long-lived gRPC connection with the Hatchet queue. We've attempted to optimize our latency to get our task start times down to 25-50ms and much more optimization is on the roadmap.

We also support a number of extra features that you'd expect, like retries, timeouts, cron schedules, dependent tasks. A few things we're currently working on - we use RabbitMQ (confusing, yes) for pub/sub between engine components and would prefer to just use Postgres, but didn't want to spend additional time on the exchange logic until we built a stable underlying queue. We are also considering the use of NATS for engine-engine and engine-worker connections.

We'd greatly appreciate any feedback you have and hope you get the chance to try out Hatchet.

1. leetrout ◴[] No.39645209[source]
Just pointing out even though this is a "Show HN" they are, indeed, backed by YC.

Is this going to follow the "open core" pattern or will there be a different path to revenue?

replies(3): >>39645268 #>>39646788 #>>39650345 #
2. MuffinFlavored ◴[] No.39645268[source]
> path to revenue

There have to be at least 10 different ways between different cloud providers to run a distributed task queue. Amazon, Azure, GCP

Self-hosting RabbitMQ, etc.

I'm curious how they are able to convince investors that there is a sizable portion of market they think doesn't already have this solved (or already has it solved and is willing to migrate)

replies(3): >>39645357 #>>39646344 #>>39649406 #
3. Kinrany ◴[] No.39645357[source]
There will be space for improvement until every cloud has a managed offering with exactly the same interface. Like docker, postgres, S3.
4. leetrout ◴[] No.39646344[source]
I am curious to see where they differentiate themselves on observability on the longer run.

Comparing to rabbitmq it should be easier to see what is in the queue itself without mutating it, for instance.

replies(1): >>39648403 #
5. abelanger ◴[] No.39646788[source]
Yep, we're backed by YC in the W24 batch - this is evident on our landing page [1].

We're both second time CTOs and we've been on both sides of this, as consumers of and creators of OSS. I was previously a co-founder and CTO of Porter [2], which had an open-core model. There are two risks that most companies think about in the open core model:

1. Big companies using your platform without contributing back in some way or buying a license. I think this is less of a risk, because these organizations are incentivized to buy a support license to help with maintenance, upgrades, and since we sit on a critical path, with uptime.

2. Hyperscalers folding your product in to their offering [3]. This is a bigger risk but is also a bit of a "champagne problem".

Note that smaller companies/individual developers are who we'd like to enable, not crowd out. If people would like to use our cloud offering because it reduces the headache for them, they should do so. If they just want to run our service and manage their own PostgreSQL, they should have the option to do that too.

Based on all of this, here's where we land on things:

1. Everything we've built so far has been 100% MIT licensed. We'd like to keep it that way and make money off of Hatchet Cloud. We'll likely roll out a separate enterprise support agreement for self hosting.

2. Our cloud version isn't going to run a different core engine or API server than our open source version. We'll write interfaces for all plugins to our servers and engines, so even if we have something super specific to how we've chosen to do things on the cloud version, we'll expose the options to write your own plugins on the engine and server.

3. We'd like to make self-hosting as easy to use as our cloud version. We don't want our self-hosted offering to be a second-class citizen.

Would love to hear everyone's thoughts on this.

[1] https://hatchet.run

[2] https://github.com/porter-dev/porter

[3] https://www.elastic.co/blog/why-license-change-aws

replies(2): >>39650406 #>>39653962 #
6. MuffinFlavored ◴[] No.39648403{3}[source]
https://www.rabbitmq.com/docs/management
replies(1): >>39648645 #
7. leetrout ◴[] No.39648645{4}[source]
Sure, but to see what is in the queue you have to operate on it, mutating it. With this using postgres we can just look in the table.
8. Aeolun ◴[] No.39649406[source]
> I'm curious how they are able to convince investors that there is a sizable portion of market they think doesn't already have this solved

Is there any task queue you are completely happy with?

I use Redis, but it’s only half of the solution.

9. wodenokoto ◴[] No.39650345[source]
Wasn’t the first Dropbox introduction also a show HN?

I don’t think this is out of place

replies(1): >>39651492 #
10. leetrout ◴[] No.39651492[source]
I am not saying it is out of place but I feel for such a long winded explanation of what they are doing a missing "YC W24" was surprising.
11. echelon ◴[] No.39653962[source]
I got flagged, but I want to reiterate that you need legal means of stopping AWS from simply lifting your product wholesale. Just look at all the other companies they've turned into their own thankless premium offerings.

Put in a DAU/MAU/volume/revenue clause that pertains specifically only to hyperscalers and resellers. Don't listen to the naysayers telling you not to do it. This isn't their company or their future. They don't care if you lose your business or that you put in all of that work just for a tech giant to absorb it for free and turn it against you.

Just do it. Do it now and you won't get (astroturfed?) flack for that decision later by people who don't even have skin in the game. It's not a big deal. I would buy open core products with these protections -- it's not me you're protecting yourselves against, and I'm nowhere in the blast radius. You're trying not to die in the miasma of monolithic cloud vendors.