←back to thread

420 points rvz | 1 comments | | HN request time: 0s | source
Show context
pfraze ◴[] No.41412758[source]
Copying over my latest backend status update; figure folks would find it interesting

Servers are holding up so far! Fortunately we were overprovisioned. If we hit 4mm new signups then things should get interesting. We did have some degradations (user handles entering an invalid state, event-stream crashed a couple times, algo crashed a couple times, image servers hit bad latencies) but we managed to avoid a full outage.

We use an event-sourcing model which is: K/V database for primary storage (actually sqlite), into a golang event stream, then into scylladb for computed views. Various separate services for search, algorithms, and images. Hybrid on-prem & cloud. There are ~20 of the k/v servers, 1 event-stream, 2 scylla clusters (I believe).

The event-stream crash would cause the application to stop making progress on ingesting events, but we still got the writes, so you'd see eg likes failing to increment the counter but then magically taking effect 60 seconds later. Since the scylla cluster and the KV stores stayed online, we avoided a full outage.

replies(9): >>41412984 #>>41413343 #>>41413506 #>>41413569 #>>41415242 #>>41415812 #>>41416225 #>>41417516 #>>41417547 #
1. johnisgood ◴[] No.41416225[source]
Hmm, it is something I would use Elixir for.