Upgrading Uber's MySQL Fleet

1. xyst ◴[14 Oct 24 14:42 UTC] No.41838001[source]▶

I wonder if an upgrade like this would be less painful if the db layer was containerized?

The migration process they described would be less painful with k8s. Especially with 2100+ nodes/VMs

replies(5): >>41838083 #>>41838418 #>>41838563 #>>41839037 #>>41839595 #

2. remon ◴[14 Oct 24 14:51 UTC] No.41838083[source]▶

Their entire setup seems somewhat suspect. I can't think of any technical justification for needing 21k instances for their type of business.

3. zemo ◴[14 Oct 24 15:28 UTC] No.41838418[source]▶

>>41838001 (TP) #

upgrade clients and testing the application logic, changes to the queries themselves as written, the process of detecting the regression and getting MySQL patched by percona, changes to default collation ... all of these things have nothing to do with whether the instances are in containers and whether the containers are managed by k8s or not.

4. shakiXBT ◴[14 Oct 24 15:40 UTC] No.41838563[source]▶

>>41838001 (TP) #

running databases (or any stateful application, really) on k8s is a mess, especially at that scale

5. meesles ◴[14 Oct 24 16:25 UTC] No.41839037[source]▶

>>41838001 (TP) #

A pipe dream. Having recently interacted with a modern k8s operator for Postgres, it lacked support for many features that had been around for a long time. I'd be surprised if MySQL's operators are that much better. Also consider the data layer, which is going to need to be solved regardless. Of course at Uber's scale they could write their own, I guess.

At that point, if you're reaching in and scripting your pods to do what you want, you lose a lot of the benefits of convention and reusability that k8s promotes.

replies(1): >>41841224 #

6. __turbobrew__ ◴[14 Oct 24 17:15 UTC] No.41839595[source]▶

>>41838001 (TP) #

I can tell you that k8s starts to have issues once you get over 10k nodes in a single cluster. There has been some work in 1.31 to improve scalability but I would say past 5k nodes things no longer “just work”: https://kubernetes.io/blog/2024/08/15/consistent-read-from-c...

The current bottleneck appears to be etcd, boltdb is just a crappy data store. I would really like to try replacing boltdb with something like sqlite or rocksdb as the data persistence layer in etcd but that is non-trivial.

You also start seeing issues where certain k8s operators do not scale either, for example cilium cannot scale past 5k nodes currently. There are fundamental design issues where the cilium daemonset memory usage scales with the number of pods/endpoints in the cluster. In large clusters the cilium daemonset can be using multiple gigabytes of ram on every node in your cluster. https://docs.cilium.io/en/stable/operations/performance/scal...

Anyways, the TL;DR is that at this scale (16k nodes) it is hard to run k8s.

7. jcgl ◴[14 Oct 24 19:54 UTC] No.41841224[source]▶

>>41839037 #

> it lacked support for many features that had been around for a long time

Care to elaborate at all? Were they more like missing edge cases or absent core functionality? Not to imply that missing edge cases aren’t important when it comes to DB ops.