←back to thread

420 points rvz | 2 comments | | HN request time: 0.417s | source
Show context
pfraze ◴[] No.41412758[source]
Copying over my latest backend status update; figure folks would find it interesting

Servers are holding up so far! Fortunately we were overprovisioned. If we hit 4mm new signups then things should get interesting. We did have some degradations (user handles entering an invalid state, event-stream crashed a couple times, algo crashed a couple times, image servers hit bad latencies) but we managed to avoid a full outage.

We use an event-sourcing model which is: K/V database for primary storage (actually sqlite), into a golang event stream, then into scylladb for computed views. Various separate services for search, algorithms, and images. Hybrid on-prem & cloud. There are ~20 of the k/v servers, 1 event-stream, 2 scylla clusters (I believe).

The event-stream crash would cause the application to stop making progress on ingesting events, but we still got the writes, so you'd see eg likes failing to increment the counter but then magically taking effect 60 seconds later. Since the scylla cluster and the KV stores stayed online, we avoided a full outage.

replies(9): >>41412984 #>>41413343 #>>41413506 #>>41413569 #>>41415242 #>>41415812 #>>41416225 #>>41417516 #>>41417547 #
1. louhike ◴[] No.41413343[source]
That’s interesting. Why do you use event sourcing? Is having a full history important for a website/app like bluesky?
replies(1): >>41413404 #
2. pfraze ◴[] No.41413404[source]
Ahhh you know what, I should call it stream processing or something, because we don't store the data entirely as events. We store the data as a mutable K/V which emits an event stream of changes, which can then be ingested into different views. We chose not to store changes as events specifically because we don't want unbounded growth in the system. Initial syncs work by fetching the current state of the K/V store (the "data repo").

Bluesky is built on atprotocol (atproto.com) and can be thought of as an open distributed system. The event stream is for replicating throughout the various services.