Upgrading Uber's MySQL Fleet

(www.uber.com)

236 points benocodes | 5 comments | 14 Oct 24 12:08 UTC | HN request time: 1.137s | source

Show context

whalesalad ◴[14 Oct 24 12:39 UTC] No.41836959[source]▶

So satisfying to do a huge upgrade like this and then see the actual proof in the pudding with all the reduced latencies and query times.

replies(1): >>41837062 #

hu3 ◴[14 Oct 24 12:52 UTC] No.41837062[source]▶

>>41836959 #

Yeah some numbers caught my attention like ~94% reduction in overall database lock time.

And to think they never have to worry about VACUUM. Ahh the peace.

replies(4): >>41837227 #>>41837317 #>>41837626 #>>41838255 #

anonzzzies ◴[14 Oct 24 13:14 UTC] No.41837227[source]▶

>>41837062 #

Yeah, until vacuum is gone, i'm not touching postgres. So many bad experiences with our use cases over the decades. I guess most people don't have our uses, but i'm thinking Uber does.

replies(2): >>41837323 #>>41837537 #

leishman ◴[14 Oct 24 13:52 UTC] No.41837537[source]▶

>>41837227 #

Postgres 17 tremendously improves vacuum performance

replies(1): >>41838004 #

1. mannyv ◴[14 Oct 24 14:42 UTC] No.41838004[source]▶

>>41837537 #

Vacuuming is a design decision that may have been valid back in the day, but is really a ball and chain today.

In a low-resource environment deferring work makes sense. But even in low-resource environment the vacuum process would consume huge amounts of resources to do its job, especially given any kind of scale. And the longer it's deferred the longer the process will take. And if you actually are in a low-resource environment it'll be a challenge to have enough disk space to complete the vacuum (I'm looking at you, sunos4) - and don't even talk about downtime.

I don't understand how large pgsql users handle vacuuming in production. Maybe they just don't do it and let the disk usage grow unbounded, because disk space is cheap compared to the aggravation of vacuuming?

replies(1): >>41838234 #

2. wongarsu ◴[14 Oct 24 15:06 UTC] No.41838234[source]▶

>>41838004 (TP) #

You run VACUUM often enough that you never need a VACUUM FULL. A normal VACUUM doesn't require any exclusive locks or a lot of disk space, so usually you can just run it in the background. Normally autovacuum does that for you, but at scale you transition to running it manually at low traffic times; or if you update rows a lot you throw more CPUs at the database server and run it frequently.

Vacuuming indices is a bit more finicky with locks, but you can just periodically build a new index and drop the old one when it becomes an issue

replies(1): >>41838818 #

3. sgarland ◴[14 Oct 24 16:04 UTC] No.41838818[source]▶

>>41838234 #

People not realizing you can tune autovacuum on a per-table basis is the big one. Autovacuum can get a lot done if you have enough workers and enough spare RAM to throw at them.

For indices, as you mentioned, doing either a REINDEX CONCURRENTLY (requires >= PG12), or a INDEX CONCURRENTLY / DROP CONCURRENTLY (and a rename if you’d like) is the way to go.

In general, there is a lot more manual maintenance needed to keep Postgres running well at scale compared to MySQL, which is why I’m forever upset that Postgres is touted as the default to people who haven’t the slightest clue nor the inclination to do DB maintenance. RDS doesn’t help you here, nor Aurora – maintenance is still on you.

replies(1): >>41840662 #

4. anonzzzies ◴[14 Oct 24 18:56 UTC] No.41840662{3}[source]▶

>>41838818 #

We make good money 'saving' people from Aurora; you can throw traffic at it and pay more. We often migrate companies who then end up with a fraction of the price.

replies(1): >>41842752 #

5. sgarland ◴[14 Oct 24 22:24 UTC] No.41842752{4}[source]▶

>>41840662 #

I’m convinced that Aurora’s team consists mostly of sales. There are certainly some talented engineers working on it – I’ve talked to a few – but by and large, all of my interactions with AWS about DB stuff was been them telling me how much better it is than other options.

I’ve tested Aurora Postgres and MySQL against both RDS and native (on my own, extremely old hardware), and Aurora has never won in performance. I’ve been told that “it’s better in high concurrency,” but IMO, that’s what connection poolers are for.

↑