Most active commenters

crabmusket(8)
yen223(4)
stavros(3)
simonw(3)

Popular/hot comments

>>41833218 #
>>41833041 #
>>41833373 #
>>41833708 #

←back to thread

Zero-latency SQLite storage in every Durable Object

(simonwillison.net)

1. stavros ◴[13 Oct 24 23:40 UTC] No.41832728[source]▶

>>41832547 (OP) #

This is a really interesting design, but these kinds of smart systems always inhabit an uncanny valley for me. You need them in exactly two cases:

1. You have a really high-load system that you need to figure out some clever ways to scale.

2. You're working on a toy project for fun.

If #2, fine, use whatever you want, it's great.

If this is production, or for Work(TM), you need something proven. If you don't know you need this, you don't need it, go with a boring Postgres database and a VM or something.

If you do know you need this, then you're kind of in a bind: It's not really very mature yet, as it's pretty new, and you're probably going to hit a bunch of weird edge cases, which you probably don't really want to have to debug or live with.

So, who are these systems for, in the end? They're so niche that they can't easily mature and be used by lots of serious players, and they're too complex with too many tradeoffs to be used by 99.9% of companies.

The only people I know for sure are the target market for this sort of thing is the developers who see something shiny, build a company (or, worse, build someone else's company) on it, and then regret it pretty soon and move to something else (hopefully much more boring).

Does anyone have more insight on this? I'd love to know.

replies(8): >>41832813 #>>41832877 #>>41832980 #>>41832987 #>>41833057 #>>41833093 #>>41833218 #>>41835368 #

2. jmtulloss ◴[13 Oct 24 23:55 UTC] No.41832813[source]▶

>>41832728 (TP) #

If you're in #1, you talk to CloudFlare. They need some great customer stories and they have some great engineers that are most likely willing to work with you on how this will work/help you with bugs in exchange for some success stories. If it gets proven out this turns into a service relationship, but early on it's a partnership.

3. gregwebs ◴[14 Oct 24 00:05 UTC] No.41832877[source]▶

>>41832728 (TP) #

There are a lot of cases of low traffic applications that aren’t toys but instead are internal tools- this could be a great option for those.

For higher traffic they are asking you to figure out how to shard your data and it’s compute. That’s really hard to do without hitting edge cases.

replies(1): >>41832894 #

4. stavros ◴[14 Oct 24 00:09 UTC] No.41832894[source]▶

>>41832877 #

Why would you use this for an internal, low-traffic tool over Postgres?

replies(2): >>41832962 #>>41833011 #

5. fracus ◴[14 Oct 24 00:21 UTC] No.41832962{3}[source]▶

>>41832894 #

Could this be used to get a time edge in trading? I'm not an expert, just thinking out loud. I remember hearing about firms laying wire in a certain way because getting a microsecond jump on changing rates could be everything for them.

replies(1): >>41833310 #

6. crabmusket ◴[14 Oct 24 00:25 UTC] No.41832980[source]▶

>>41832728 (TP) #

As far as I can tell, multiplayer is the killer app for Durable Objects. If you want to build another Figma, Google Docs, etc, the programming model of Durable Objects is super handy.

This article goes into it more: https://digest.browsertech.com/archive/browsertech-digest-cl...

I think this old article is quite relevant too: http://ithare.com/scaling-stateful-objects/

Anyone who read the Figma multiplayer article and thought "that's kind of what I need" would be well served by Durable Objects, I think. https://www.figma.com/blog/rust-in-production-at-figma/

There are other approaches - I've worked in the past with CRDTs over WebRTC which felt absolutely space-age. But that's a much more complicated foundation compared to a websocket and a single class instance "somewhere" in the cloud.

replies(1): >>41833041 #

7. MuffinFlavored ◴[14 Oct 24 00:27 UTC] No.41832987[source]▶

>>41832728 (TP) #

> If this is production, or for Work(TM), you need something proven.

I feel like part of Cloudflare's business model is to try to convince businesses at scale to solve problems in a non-traditional way using technology they are cooking up, no matter the cost.

8. alright2565 ◴[14 Oct 24 00:30 UTC] No.41833011{3}[source]▶

>>41832894 #

It's so low traffic that you don't want to pay the minimum $35/mo for a PostgreSQL instance on AWS maybe. Or you're required by policy to have a single-tenant architecture, but a full always-on database server would be overkill.

9. stavros ◴[14 Oct 24 00:35 UTC] No.41833041[source]▶

>>41832980 #

That's a very interesting use case. Given that your "players" aren't guaranteed to be local to the DO, doesn't using DOs only make sense in high-traffic situations again? Otherwise you might as well just serve the players from a conventional server, no?

CRDTs really do sound amazing, though.

replies(3): >>41833255 #>>41833274 #>>41834616 #

10. yen223 ◴[14 Oct 24 00:38 UTC] No.41833057[source]▶

>>41832728 (TP) #

I almost have the opposite view:

When starting out you can get away with using a simple Postgres database. Postgres is fine for low-traffic projects with minimal latency constraints, and you probably want to spend your innovation tokens elsewhere.

But in very high-traffic Production cases with tight latency requirements, you will start to see all kinds of weird and wacky traffic patterns, that barebones Postgres won't be able to handle. It's usually in these cases where you'd need to start exploring alternatives to Postgres. It's also in these cases where you can afford to hire people to manage your special database needs.

replies(1): >>41833100 #

11. danpalmer ◴[14 Oct 24 00:45 UTC] No.41833093[source]▶

>>41832728 (TP) #

I'd view the split here along the axes of debuggability/introspection.

There are many services that just don't require performance tuning or deep introspection, things like internal tools. This is where I think serverless frameworks do well, because they avoid a lot of time spent on deployment. It's nice if these are fast, but that's rarely a key requirement. Usually the key requirement is that they are fast to build and low maintenance. It's possible that Cloudflare have got a good story for developer experience here that gets things working quickly, but that's not their pitch, and there are a lot of services competing to make this sort of development fast.

However where I don't think these services work well is when you have high debuggability and introspection requirements. What metrics do I get out of this? What happens if some Durable Objects are just slow, do we have the information to understand why? Can we rectify it if they are? What's the logging story, and how much does it cost?

I think these sorts of services may be a good idea for a startup on day 1 to build some clever distributed system in order to put off thinking about scaling, but I can't help but think that scale-up sized companies would be wanting to move off this onto something they can get into the details more with, and that transition would be a hard one.

12. simonw ◴[14 Oct 24 00:46 UTC] No.41833100[source]▶

>>41833057 #

Have you worked on any examples of projects that started on PostgreSQL and ended up needing to migrate to something specialized?

replies(1): >>41833373 #

13. klabb3 ◴[14 Oct 24 01:09 UTC] No.41833218[source]▶

>>41832728 (TP) #

Databases is an extremely slow-maturing area, similar to programming languages, but are all deviations from Postgres shiny and hipster?

The idea of colocating data and behavior is really a quantifiable reduction in complexity. It removes latency and bandwidth concerns, which means both operational concerns and development concerns (famously the impact of the N+1 problem is greatly reduced). You can absolutely argue that networked Postgres is better for other reasons (and you may be right) but SQLite is about as boring and predictable as you can get, with known strong advantages. This is the reason it’s getting popular on the server.

That said, I don’t like the idea of creating many small databases very much - as they suggest with Durable Objects. That gives noSQL nightmares - breaking all kinds of important invariants of relational dbs. I think it’s much preferable to use SQLite as a monolithic database like it’s done in their D1 product.

replies(4): >>41833285 #>>41833308 #>>41834216 #>>41834497 #

14. dumbo-octopus ◴[14 Oct 24 01:16 UTC] No.41833255{3}[source]▶

>>41833041 #

In practice you’re most likely to be collaborating with other folks on your school project group, work team, close family, etc. Sure there are exceptions, but generally speaking picking a service location near your first group member ensures low latency for them (and they’re probably most engaged), and is likely to have lowish latency for everyone else.

On the flip side, picking US-East-1 gives okayish latency to folks near that, and nobody else.

replies(1): >>41833295 #

15. crabmusket ◴[14 Oct 24 01:18 UTC] No.41833274{3}[source]▶

>>41833041 #

Best case, the players are co-located in a city or country, and they'll benefit from data center locality.

Worst case, they're not co-located, and one participant has good latency, and the other doesn't. This is equivalent to the "deploy the backend in a single server/datacenter" approach.

Aside from the data locality, I still find the programming model (a globally-unique and addressable single-threaded class instance) to be quite nice, and would want to emulate it even without the Cloudflare edge magic.

replies(2): >>41833340 #>>41834709 #

16. crabmusket ◴[14 Oct 24 01:21 UTC] No.41833285[source]▶

>>41833218 #

> That gives noSQL nightmares - breaking all kinds of important invariants of relational dbs

IMO Durable Objects map well to use cases where there actually are documents. Think of Figma. There is a ton of data that lives inside the literal Figma document. It would be awful to have a relational table for like "shapes" with one row per rectangle across Figma's entire customer base. That's just not an appropriate use of a relational database.

So let's say I built Figma on MongoDB, where each Figma document is a Mongo document. That corresponds fairly straightforwardly to each Figma document being a Durable Object instance, using either the built-in noSQL storage that Durable Objects already have, or a small Sqlite relational database which does have a "shapes" table, but only containing the shapes in this one document.

replies(2): >>41833997 #>>41836913 #

17. crabmusket ◴[14 Oct 24 01:23 UTC] No.41833295{4}[source]▶

>>41833255 #

And the corollary to that is that often your collaborations have a naturally low scale. While your entire app/customerbase as a whole needs to handle thousands of requests per second or more, one document/shard may only need to handle a handful of people.

18. ◴[14 Oct 24 01:24 UTC] No.41833308[source]▶

>>41833218 #

19. crabmusket ◴[14 Oct 24 01:24 UTC] No.41833310{4}[source]▶

>>41832962 #

I'm also no expert, but from reading around the subject a little (Flash Boys by Michael Lewis was pretty cool, also Jane Street's podcast has some fantastic information)... no. I doubt you'd be on a public cloud if low-latency trading is what you're doing.

replies(1): >>41834751 #

20. paulgb ◴[14 Oct 24 01:27 UTC] No.41833340{4}[source]▶

>>41833274 #

> Aside from the data locality, I still find the programming model (a globally-unique and addressable single-threaded class instance) to be quite nice, and would want to emulate it even without the Cloudflare edge magic.

You might be interested in Plane (https://plane.dev/ / https://github.com/jamsocket/plane), which we sometimes describe as a sort of Durable Object-like abstraction that can run anywhere containers can.

(I'm also one of the articles you linked, thanks for the shoutout!)

replies(1): >>41833503 #

21. yen223 ◴[14 Oct 24 01:32 UTC] No.41833373{3}[source]▶

>>41833100 #

I did, twice.

The second time, we had a reporting system that eventually stored billions of rows per day in a Postgres database. Processing times got so bad that we decided to migrate to Clickhouse, resulting in a substantial boost to query times. I maintain that we haven't exhausted all available optimisations for Postgres, but I cannot deny that the migration made sense in the long run - OLTP vs OLAP and all that.

(The first time is a funny story that I'm not quite ready to share.)

replies(3): >>41833415 #>>41833713 #>>41833821 #

22. simonw ◴[14 Oct 24 01:41 UTC] No.41833415{4}[source]▶

>>41833373 #

That makes a lot of sense to me. One of my strongest hints that a non-relational data store might be a good idea is "grows by billions of rows a day".

replies(1): >>41833708 #

23. crabmusket ◴[14 Oct 24 01:59 UTC] No.41833503{5}[source]▶

>>41833340 #

I am interested, and I really enjoy your work on Browsertech! I haven't needed Plane above/over what Cloudflare is providing, but I've got it in the back of my mind as an option.

I've long hoped other providers might jump on the Durable Objects bandwagon and provide competing functionality so we're not locked in. Plane/Jamsocket looks like one way to go about mitigating that risk to a certain extent.

24. adhamsalama ◴[14 Oct 24 02:35 UTC] No.41833708{5}[source]▶

>>41833415 #

Isn't Clickhouse relational?

replies(3): >>41833752 #>>41833758 #>>41833828 #

25. adhamsalama ◴[14 Oct 24 02:37 UTC] No.41833713{4}[source]▶

>>41833373 #

Well, this isn't specific to Postgres, is it?

If you were storing billions of rows per day in MySQL, SQL Server, or Oracle, it still wouldn't be able to handle it, would it?

replies(1): >>41833748 #

26. yen223 ◴[14 Oct 24 02:44 UTC] No.41833748{5}[source]▶

>>41833713 #

That's right. The key difference is using row-based vs column-based databases (i.e. OLTP vs OLAP). Any good database person should be cringing at the thought of using Postgres (or MySQL, Oracle, Sql Server, etc) for pulling reporting data.

That said, no regrets using Postgres there. If we started with Clickhouse the project could have not launched as quickly as it did, and that would have given us more problems.

27. crabmusket ◴[14 Oct 24 02:45 UTC] No.41833752{6}[source]▶

>>41833708 #

It does allow you to query with SQL, but it's meant for OLAP workloads, not OLTP. Its internal architecture and storage is different to what you'd usually think of as a relational database, like Postgres. See https://clickhouse.com/docs/en/concepts/why-clickhouse-is-so...

The term "relational" is overloaded. Sometimes it means "you can use SQL" and sometimes it means "OLTP with data stored in an AoS btree".

(And sometimes, a pet peeve of mine, it means "data with relationships" which is based on misunderstanding the term "relation". If someone asks you if "your data is relational" they are suffering from this confusion.)

28. yen223 ◴[14 Oct 24 02:46 UTC] No.41833758{6}[source]▶

>>41833708 #

Clickhouse is a SQL database, so I guess it is?

(Strictly speaking since a "relation" in the original Codd-paper sense is a table, anything with tables is relational. I don't know if that's what people mean by "relational", plus I don't know what counts as "non-relational" in that sense)

29. xarope ◴[14 Oct 24 02:58 UTC] No.41833821{4}[source]▶

>>41833373 #

Right, OLTP vs OLAP are very different workloads (using the car analogy, that would be like using a ferrari to tow a trailer, and an F250 to... oh wait, an F250 can do anything!).

But seriously though, even if you use postgres, as a former DBA (DB2 and Oracle) I would have tuned the OLTP database very differently to the OLAP database, and I don't mean just indexes, but even during ETL from OLTP->OLAP you might decide to de-normalize columns on the OLAP side simply to speed up queries (OLAP databases are the sort of database you were warned about, where indexes can be 10x the data size)

30. simonw ◴[14 Oct 24 02:59 UTC] No.41833828{6}[source]▶

>>41833708 #

Kind of? By "relational" there I meant "traditional relational databases like MySQL and PostgreSQL that are optimized for transactions and aren't designed for large scale analytics".

31. jchanimal ◴[14 Oct 24 03:46 UTC] No.41833997{3}[source]▶

>>41833285 #

We are wrestling with questions like this on the new document database we’re building. A database should correspond to some administrative domain object.

Today in Fireproof a database is a unit of sharing, but we are working toward a broader model where a database corresponds to an individual application’s state. So one database is all the shared documents not just a single unit of sharing.

These small changes early on can have big impact later. If you’re interested in these sort of design questions, the Fireproof Discord is where we are hashing out the v0.20 api.

(I was an early contributor to Apache CouchDB. Damien Katz, creator of CouchDB, is helping with engineering and raised these questions recently, along with other team members.)

32. masterj ◴[14 Oct 24 04:33 UTC] No.41834216[source]▶

>>41833218 #

If you adopt a wide-column db like Cassandra or DynamoDB, don’t you have to pick a shard for your table? The idea behind Durable Objects seems similar

replies(1): >>41834628 #

33. 8n4vidtmkvmk ◴[14 Oct 24 05:33 UTC] No.41834497[source]▶

>>41833218 #

N+1 problem is also reduced if you keep your one and only server next to your one and only database.

This was actually the solution we came up with at a very big global company. Well, not 1 server, but 1 data center. If your write leaders are all in one place it apparently doesn't matter that everything else is global, for certain write requests at least.

34. skybrian ◴[14 Oct 24 05:56 UTC] No.41834616{3}[source]▶

>>41833041 #

Some games have regions and you only see players in the same region. For example, a “Europe” region. If you’re in the US and you connect to the Europe region, you know that you should expect some lag.

And it seems like that would work just as well with durable objects.

35. simpsond ◴[14 Oct 24 05:58 UTC] No.41834628{3}[source]▶

>>41834216 #

You have a row key, which gets consistently hashed to a shard / node on the ring.

36. tlarkworthy ◴[14 Oct 24 06:15 UTC] No.41834709{4}[source]▶

>>41833274 #

It's the actor model essentially.

You can have a DO proxy each user connection, then they forward messages to the multipler document. The user proxy deals with ordering and buffering their connection message state in the presence of disconnects, and the document DO handles the shared state.

replies(1): >>41834801 #

37. aldonius ◴[14 Oct 24 06:25 UTC] No.41834751{5}[source]▶

>>41833310 #

Aren't the HFT boxes usually stock exchange colocations? Each trader gets a rack (or multiple racks depending on size) in the exchange's datacenter, every rack has the same cable length to the switch, etc.

38. crabmusket ◴[14 Oct 24 06:33 UTC] No.41834801{5}[source]▶

>>41834709 #

It's actors plus a global routing system that means all messages addressed to a unique identifier will arrive in the actor instance. I haven't seen any other actor frameworks that provide that.

replies(1): >>41836380 #

39. camgunz ◴[14 Oct 24 08:15 UTC] No.41835368[source]▶

>>41832728 (TP) #

First, this is very insightful--I think most people should go through this exact analysis before architecting a system.

As others have said, the use is multiplayer, and that's because you need everyone to see your changes ASAP for the app to feel good. But more broadly, the storage industry has been trying to build something that's consistent, low latency, and multiuser for a long time. That's super hard, just from a physics point of view there's generally a tradeoff between consistency and latency. So I think people are trying different models to get there, and a lot of that experimentation (not all, cf Yugabyte or Cockroach) is happening with SQLite.

40. tlarkworthy ◴[14 Oct 24 11:10 UTC] No.41836380{6}[source]▶

>>41834801 #

Akka and Erlang both support distributed routing to their actors, but this is planetary scale and fully-managed out of the box, which is very cool.

41. klabb3 ◴[14 Oct 24 12:32 UTC] No.41836913{3}[source]▶

>>41833285 #

> Durable Objects map well to use cases where there actually are documents

Right. I wouldn’t dispute this. This is akin to a file format from software back in the day (like say photoshop but now with multiplayer). What this means is that you get different compatibility boundaries and you relinquish centralized control and ability to do transparent migrations and analysis. For all intents and purposes, the documents should be more or less opaque and self-contained. I personally like this, but I also recognize that most web engineers of our current generation are not used to think in this disciplined and defensive way upfront.

↑