Most active commenters

anonzzzies(5)
sgarland(5)
InsideOutSanta(3)
tomnipotent(3)

Popular/hot comments

>>41837317 #
>>41837062 #
>>41837435 #

←back to thread

Upgrading Uber's MySQL Fleet

(www.uber.com)

1. whalesalad ◴[14 Oct 24 12:39 UTC] No.41836959[source]▶

>>41836748 (OP) #

So satisfying to do a huge upgrade like this and then see the actual proof in the pudding with all the reduced latencies and query times.

replies(1): >>41837062 #

2. hu3 ◴[14 Oct 24 12:52 UTC] No.41837062[source]▶

>>41836959 (TP) #

Yeah some numbers caught my attention like ~94% reduction in overall database lock time.

And to think they never have to worry about VACUUM. Ahh the peace.

replies(4): >>41837227 #>>41837317 #>>41837626 #>>41838255 #

3. anonzzzies ◴[14 Oct 24 13:14 UTC] No.41837227[source]▶

>>41837062 #

Yeah, until vacuum is gone, i'm not touching postgres. So many bad experiences with our use cases over the decades. I guess most people don't have our uses, but i'm thinking Uber does.

replies(2): >>41837323 #>>41837537 #

4. InsideOutSanta ◴[14 Oct 24 13:22 UTC] No.41837317[source]▶

>>41837062 #

As somebody who has always used MySQL, but always been told that I should be using Postgres, I'd love to understand what the issues with VACUUM are, and what I should be aware of when potentially switching databases?

replies(5): >>41837435 #>>41838160 #>>41838537 #>>41839012 #>>41845775 #

5. RedShift1 ◴[14 Oct 24 13:23 UTC] No.41837323{3}[source]▶

>>41837227 #

Maybe just vacuum much more aggressively? Also there have been a lot of changes to the vacuuming and auto vacuuming process these last few years, you can pretty much forget about it.

replies(1): >>41837671 #

6. djbusby ◴[14 Oct 24 13:40 UTC] No.41837435{3}[source]▶

>>41837317 #

VACUUM and VACUUM FULL (and/or with ANALYZE) can lock tables for a very long time, especially when the table is large. Incantation may also require 2x the space for the table being operated on. In short: it's slow.

replies(4): >>41837649 #>>41837883 #>>41838660 #>>41838834 #

7. leishman ◴[14 Oct 24 13:52 UTC] No.41837537{3}[source]▶

>>41837227 #

Postgres 17 tremendously improves vacuum performance

replies(1): >>41838004 #

8. brightball ◴[14 Oct 24 14:01 UTC] No.41837626[source]▶

>>41837062 #

There are always tradeoffs.

9. sgarland ◴[14 Oct 24 14:04 UTC] No.41837649{4}[source]▶

>>41837435 #

`VACUUM` (with or without `ANALYZE`) on its own neither locks tables nor requires additional disk space. This is what the autovacuumdvaemon is doing. `VACUUM FULL` does both, as it's doing a tuple-by-tuple rewrite of the entire table.

10. anonzzzies ◴[14 Oct 24 14:07 UTC] No.41837671{4}[source]▶

>>41837323 #

Not in our experience; for our cases it is still a resource hog. We discussed it even less than a year ago with core devs and with a large postgres consultancy place; they said postgres doesn't fit our use case which was already our conclusion, no matter how much we want it to be. Mysql is smooth as butter. I have nothing to win from picking mysql just that it works; I rather use postgres as features / not oracle but...

Edit; also, as can be seen here in responses, and elsewhere on the web when discussing this, the fans say it's no problem, but many less religious users feel it's a massive design flaw (perfectly logical at the time, not so logical now) that sometimes will stop users from using it, which is a shame

replies(1): >>41837887 #

11. gomoboo ◴[14 Oct 24 14:29 UTC] No.41837883{4}[source]▶

>>41837435 #

pg_repack gets rid of the need to lock tables for the duration of the vacuum: https://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/Appen...

It is an extension though so downside there is it not being included in most Postgres installs. I’ve used it at work and it felt like a superpower getting the benefits of a vacuum full without all the usual drama.

replies(1): >>41838592 #

12. yeswecatan ◴[14 Oct 24 14:30 UTC] No.41837887{5}[source]▶

>>41837671 #

What is your use case?

replies(1): >>41838128 #

13. mannyv ◴[14 Oct 24 14:42 UTC] No.41838004{4}[source]▶

>>41837537 #

Vacuuming is a design decision that may have been valid back in the day, but is really a ball and chain today.

In a low-resource environment deferring work makes sense. But even in low-resource environment the vacuum process would consume huge amounts of resources to do its job, especially given any kind of scale. And the longer it's deferred the longer the process will take. And if you actually are in a low-resource environment it'll be a challenge to have enough disk space to complete the vacuum (I'm looking at you, sunos4) - and don't even talk about downtime.

I don't understand how large pgsql users handle vacuuming in production. Maybe they just don't do it and let the disk usage grow unbounded, because disk space is cheap compared to the aggravation of vacuuming?

replies(1): >>41838234 #

14. anonzzzies ◴[14 Oct 24 14:56 UTC] No.41838128{6}[source]▶

>>41837887 #

We have 100000s tables per database (1000s of those) (think sensor/iot data with some magic sauce that 0 of our competitors offer) that are heavy on the changes. And yes, maybe it's the wrong tool (is it though if it works without hickups?) for the job (but migrating would be severe so we would only attempt that if we are 100% sure it will work and if the endresult would be cheaper; remember; we are talking decades here, not a startup), but mysql has been taking this without any issues for decades with us (including the rapid growth of the past decade) now while far smaller setups with postgres have been really painful and all because of vacuum. We were postgres in 1999 when we ran many millions of records through it, but that was when we could do a full vacuum at night without anyone noticing. The internet grew a little bit, so that's not possible anymore. Vacuum improved too like everyone says here, and i'm not spreading the gospel or whatever; just fans (... what other word is there) blindly stating it can do loads 'now' they never considered is, well weird.

replies(1): >>41838887 #

15. mjr00 ◴[14 Oct 24 14:59 UTC] No.41838160{3}[source]▶

>>41837317 #

Worth reading up on Postgres' MVCC model for concurrency.[0]

Short version is that VACUUM is needed to clean up dead tuples and reclaim disk space. For most cases with smaller amounts of data, auto-vacuum works totally fine. But I've had issues with tables with 100m+ rows that are frequently updated where auto-vacuum falls behind and stops working completely. These necessitated a full data dump + restore (because we didn't want to double our storage capacity to do a full vacuum). We fixed this by sharding the table and tweaking auto-vacuum to run more frequently, but this isn't stuff you have to worry about in MySQL.

Honestly if you're a small shop without database/postgres experts and MySQL performance is adequate for you, I wouldn't switch. Newer versions of MySQL have fixed the egregious issues, like silent data truncation on INSERT by default, and it's easier to maintain, in my experience.

[0] https://www.postgresql.org/docs/current/mvcc-intro.html

replies(2): >>41838279 #>>41838746 #

16. wongarsu ◴[14 Oct 24 15:06 UTC] No.41838234{5}[source]▶

>>41838004 #

You run VACUUM often enough that you never need a VACUUM FULL. A normal VACUUM doesn't require any exclusive locks or a lot of disk space, so usually you can just run it in the background. Normally autovacuum does that for you, but at scale you transition to running it manually at low traffic times; or if you update rows a lot you throw more CPUs at the database server and run it frequently.

Vacuuming indices is a bit more finicky with locks, but you can just periodically build a new index and drop the old one when it becomes an issue

replies(1): >>41838818 #

17. tomnipotent ◴[14 Oct 24 15:09 UTC] No.41838255[source]▶

>>41837062 #

MySQL indexes can contain references to rows in the undo log and has a periodic VACUUM-like process to remove those references, though no where near as impactful.

18. InsideOutSanta ◴[14 Oct 24 15:11 UTC] No.41838279{4}[source]▶

>>41838160 #

Thanks for that, that's valuable information.

19. tomnipotent ◴[14 Oct 24 15:38 UTC] No.41838537{3}[source]▶

>>41837317 #

MySQL stores table data in a b+ tree where updates modify the data directly in place as transactions are committed, and overwritten data is moved to a secondary undo log to support consistent reads. MySQL indexes store primary keys and queries rely on tree traversal to find the row in the b+ tree, but it can also contain references to rows in the undo log.

PostgreSQL tables are known as heaps, which consist of slotted pages where new data is written to the first page with sufficient free space. Since it's not a b-tree and you can't resolve a row with just a primary key without a table scan, Postgres uses the physical location of the row called a tuple ID (TID, or item pointer) that contains the page and position (slot) of the row within that page. So the TID (10, 3) tells Postgres the row is in block 10 slot 3 which can be fetched directly from the page buffer or disk without having to do a tree traversal.

When PostgreSQL updates a row, it doesn’t modify the original data directly. Instead, it:

  1) Writes a new version of the row to a new page
  2) Marks the old row as outdated by updating its tuple header and relevant page metadata
  3) Updates the visibility map to indicate that the page contains outdated rows
  4) Adjusts indexes to point to the new TID of the updated row

This means that indexes need to be updated even if the column value didn't change.

Old rows continue to accumulate in the heap until the VACUUM process permanently deletes them, but this process can impact normal operations and cause issues.

Overall this means Postgres does more disk I/O for the same work as MySQL. The upside is Postgres doesn't have to worry about page splits, so things like bulk inserts can be much more efficient.

replies(2): >>41838736 #>>41839407 #

20. take-five ◴[14 Oct 24 15:44 UTC] No.41838592{5}[source]▶

>>41837883 #

pg_repack can generate a lot of WAL, which can generate so much traffic that standby servers can fall behind too much and never recover.

We've been using https://github.com/dataegret/pgcompacttable to clean up bloat without impacting stability/performance as much as pg_repack does.

21. cooljacob204 ◴[14 Oct 24 15:50 UTC] No.41838660{4}[source]▶

>>41837435 #

This is sorta mitigated by partitioning or sharding though right?

Too bad it's sorta annoying to do on plain old pg.

22. InsideOutSanta ◴[14 Oct 24 15:57 UTC] No.41838736{4}[source]▶

>>41838537 #

That's a perfect explanation, thank you very much!

23. williamdclt ◴[14 Oct 24 15:58 UTC] No.41838746{4}[source]▶

>>41838160 #

As much as I have gripes with the autovac, I’m surprised at the idea of getting to such a broken state. 100M rows is not small but not huge, how frequent is “frequent updates”? How long ago was that (there’s been a lot of changes in autovac since v9)?

“Stops working completely” should not be a thing, it could be vacuuming slower than the update frequency (although that’d be surprising) but I don’t know of any reason it’d just stop?

That being said I’ve also had issues with autovac (on aurora to be fair, couldn’t say if it was aurora-specific) like it running constantly without vacuuming anything, like there was an old transaction idling (there wasn’t)

replies(2): >>41839294 #>>41839304 #

24. sgarland ◴[14 Oct 24 16:04 UTC] No.41838818{6}[source]▶

>>41838234 #

People not realizing you can tune autovacuum on a per-table basis is the big one. Autovacuum can get a lot done if you have enough workers and enough spare RAM to throw at them.

For indices, as you mentioned, doing either a REINDEX CONCURRENTLY (requires >= PG12), or a INDEX CONCURRENTLY / DROP CONCURRENTLY (and a rename if you’d like) is the way to go.

In general, there is a lot more manual maintenance needed to keep Postgres running well at scale compared to MySQL, which is why I’m forever upset that Postgres is touted as the default to people who haven’t the slightest clue nor the inclination to do DB maintenance. RDS doesn’t help you here, nor Aurora – maintenance is still on you.

replies(1): >>41840662 #

25. williamdclt ◴[14 Oct 24 16:05 UTC] No.41838834{4}[source]▶

>>41837435 #

Only FULL takes a serious lock (normal vacuum only takes a weak lock preventing things like other vacuums or table alterations iirc).

Aside: I wish Postgres forced to make explicit the lock taken. Make me write “TAKE LOCK ACCESS EXCLUSIVE VACUUM FULL my_table”, and fail if the lock I take is too weak. Implicit locks are such a massive footgun that have caused countless incidents across the world, it’s just bad design.

replies(1): >>41839370 #

26. dhoe ◴[14 Oct 24 16:09 UTC] No.41838887{7}[source]▶

>>41838128 #

I'd generally call this amount of tables an antipattern - doing this basically implies that there's information stored in the table names that should be in rows instead, like IDs etc. -- But I'll admit that sensor related use cases have a tendency to stress the system in unusual ways, which may have forced this design.

replies(1): >>41840639 #

27. evanelias ◴[14 Oct 24 16:22 UTC] No.41839012{3}[source]▶

>>41837317 #

For an in-depth read on the differences in MVCC implementations, this post is pure gold: https://www.cs.cmu.edu/~pavlo/blog/2023/04/the-part-of-postg...

28. mjr00 ◴[14 Oct 24 16:49 UTC] No.41839294{5}[source]▶

>>41838746 #

It's been a while, but IIRC it was on pg12. "Stopped working completely" I'm basing on the vacuum statistics saying the last auto-vacuum started weeks ago for these tables and never actually finished. Frequent updates means regularly rewriting 10 million rows (at various places) throughout the table. I also should mention that there were 100+ materialized views built off this table which I'm sure had an impact.

In any case, this got resolved but caused a huge operational headache, and isn't something that would have been a problem with MySQL. I feel like that's the main reason VACUUM gets hated on; all of the problems with it are solvable, but you only find those problems by running into them, and when you run into them on your production database it ends up somewhere between "pain in the ass" and "total nightmare" to resolve.

29. sgarland ◴[14 Oct 24 16:50 UTC] No.41839304{5}[source]▶

>>41838746 #

On decently-sized tables (100,000,000 is, as you say, not small but not huge), if you haven’t tuned cost limiting and/or various parameters for controlling autovacuum workers, it’s entirely possible for it to effectively do nothing, especially if you’re in the cloud with backing disks that have limited IOPS / throughput.

It continues to baffle me why AWS picks some truly terrible defaults for parameter groups. I understand most of them come from the RDBMS defaults, but AWS has the luxury of knowing precisely how many CPUs and RAM any given instance has. On any decently-sized instance, it should allocate far more memory for maintenance_work_mem, for example.

30. luhn ◴[14 Oct 24 16:55 UTC] No.41839370{5}[source]▶

>>41838834 #

`TAKE LOCK ACCESS EXCLUSIVE VACUUM FULL` is just an incantation that will be blindly copy-pasted. I don't see how it would stop anyone from shooting themselves in the foot.

replies(1): >>41840122 #

31. sgarland ◴[14 Oct 24 16:59 UTC] No.41839407{4}[source]▶

>>41838537 #

> The upside is Postgres doesn't have to worry about page splits, so things like bulk inserts can be much more efficient.

Not in the heap, but if you have any index on the table (I know, don’t do that for bulk loads, but many don’t / it isn’t feasible sometimes) then you’re still dealing with a B+tree (probably).

Also, MySQL still gets the nod for pure bulk load speed via MySQLShell’s Parallel Import Utility [0]. You can of course replicate this in Postgres by manually splitting the input file and running multiple \COPY commands, but having a tool do it all in one is lovely.

[0]: https://dev.mysql.com/doc/mysql-shell/8.0/en/mysql-shell-uti...

replies(1): >>41839818 #

32. tomnipotent ◴[14 Oct 24 17:38 UTC] No.41839818{5}[source]▶

>>41839407 #

> then you’re still dealing with a B+tree

Absolutely, though they're generally orders of magnitude smaller than the table file unless you're INCLUDE'ing lots of columns.

There's pg_bulkload which supports parallel writers as well as deferred index updates until the loading process is complete. Not sure how it compares to what MySQL offers out of the box, but I definitely agree that the MySQL tooling ecosystem in general has a leg up.

33. immibis ◴[14 Oct 24 18:04 UTC] No.41840122{6}[source]▶

>>41839370 #

Imagine two footguns. One shoots your foot off when you open the window. The second requires you to point a gun at your foot and pull the trigger before the window unlocks. Far fewer people will suffer accidental foot injuries from the latter.

34. anonzzzies ◴[14 Oct 24 18:54 UTC] No.41840639{8}[source]▶

>>41838887 #

Especially back when we started. Now we would've done it differently, but still think postgres wouldn't really work. Guess we will never now as even far smaller data sets do not work in the way we need them.

35. anonzzzies ◴[14 Oct 24 18:56 UTC] No.41840662{7}[source]▶

>>41838818 #

We make good money 'saving' people from Aurora; you can throw traffic at it and pay more. We often migrate companies who then end up with a fraction of the price.

replies(1): >>41842752 #

36. sgarland ◴[14 Oct 24 22:24 UTC] No.41842752{8}[source]▶

>>41840662 #

I’m convinced that Aurora’s team consists mostly of sales. There are certainly some talented engineers working on it – I’ve talked to a few – but by and large, all of my interactions with AWS about DB stuff was been them telling me how much better it is than other options.

I’ve tested Aurora Postgres and MySQL against both RDS and native (on my own, extremely old hardware), and Aurora has never won in performance. I’ve been told that “it’s better in high concurrency,” but IMO, that’s what connection poolers are for.

37. vbezhenar ◴[15 Oct 24 07:13 UTC] No.41845775{3}[source]▶

>>41837317 #

When postgres deletes a row, it marks a disk space unused. If you delete a lot of rows, there will be plenty of unused space, but that space will not be released to OS. To release that space to OS, you need to run vacuum full or perform backup/truncate/restore.

In normal circumstances it's not needed. The unused space is reused for further inserts and if your database maintains steady rate of inserts/deletes, this won't be an issue.

However I experienced situation when database starts to grow uncontrollably and it was happening for weeks (when actual workload didn't change). I don't know what causes that behaviour. The solution was to run `vacuum full`.

↑