Upgrading Uber's MySQL Fleet

(www.uber.com)

236 points benocodes | 1 comments | 14 Oct 24 12:08 UTC | HN request time: 0.204s | source

Show context

whalesalad ◴[14 Oct 24 12:39 UTC] No.41836959[source]▶

So satisfying to do a huge upgrade like this and then see the actual proof in the pudding with all the reduced latencies and query times.

replies(1): >>41837062 #

hu3 ◴[14 Oct 24 12:52 UTC] No.41837062[source]▶

>>41836959 #

Yeah some numbers caught my attention like ~94% reduction in overall database lock time.

And to think they never have to worry about VACUUM. Ahh the peace.

replies(4): >>41837227 #>>41837317 #>>41837626 #>>41838255 #

InsideOutSanta ◴[14 Oct 24 13:22 UTC] No.41837317[source]▶

>>41837062 #

As somebody who has always used MySQL, but always been told that I should be using Postgres, I'd love to understand what the issues with VACUUM are, and what I should be aware of when potentially switching databases?

replies(5): >>41837435 #>>41838160 #>>41838537 #>>41839012 #>>41845775 #

mjr00 ◴[14 Oct 24 14:59 UTC] No.41838160[source]▶

>>41837317 #

Worth reading up on Postgres' MVCC model for concurrency.[0]

Short version is that VACUUM is needed to clean up dead tuples and reclaim disk space. For most cases with smaller amounts of data, auto-vacuum works totally fine. But I've had issues with tables with 100m+ rows that are frequently updated where auto-vacuum falls behind and stops working completely. These necessitated a full data dump + restore (because we didn't want to double our storage capacity to do a full vacuum). We fixed this by sharding the table and tweaking auto-vacuum to run more frequently, but this isn't stuff you have to worry about in MySQL.

Honestly if you're a small shop without database/postgres experts and MySQL performance is adequate for you, I wouldn't switch. Newer versions of MySQL have fixed the egregious issues, like silent data truncation on INSERT by default, and it's easier to maintain, in my experience.

[0] https://www.postgresql.org/docs/current/mvcc-intro.html

replies(2): >>41838279 #>>41838746 #

williamdclt ◴[14 Oct 24 15:58 UTC] No.41838746[source]▶

>>41838160 #

As much as I have gripes with the autovac, I’m surprised at the idea of getting to such a broken state. 100M rows is not small but not huge, how frequent is “frequent updates”? How long ago was that (there’s been a lot of changes in autovac since v9)?

“Stops working completely” should not be a thing, it could be vacuuming slower than the update frequency (although that’d be surprising) but I don’t know of any reason it’d just stop?

That being said I’ve also had issues with autovac (on aurora to be fair, couldn’t say if it was aurora-specific) like it running constantly without vacuuming anything, like there was an old transaction idling (there wasn’t)

replies(2): >>41839294 #>>41839304 #

1. sgarland ◴[14 Oct 24 16:50 UTC] No.41839304[source]▶

>>41838746 #

On decently-sized tables (100,000,000 is, as you say, not small but not huge), if you haven’t tuned cost limiting and/or various parameters for controlling autovacuum workers, it’s entirely possible for it to effectively do nothing, especially if you’re in the cloud with backing disks that have limited IOPS / throughput.

It continues to baffle me why AWS picks some truly terrible defaults for parameter groups. I understand most of them come from the RDBMS defaults, but AWS has the luxury of knowing precisely how many CPUs and RAM any given instance has. On any decently-sized instance, it should allocate far more memory for maintenance_work_mem, for example.

↑