Most active commenters
  • sgarland(5)
  • ttfkam(4)

←back to thread

264 points davidgomes | 13 comments | | HN request time: 0s | source | bottom
Show context
noirscape ◴[] No.41877487[source]
Because the actual process of upgrading Postgres is terrible.

I say this as someone who absolutely loves using it, but the actual process of upgrading Postgres is something that takes significant downtime, is error-prone and you're often better off just dumping all the databases and reimporting them in a new folder. (A good idea in general since it'll vacuum and compact indexes as well if you do it, combining a few maintenance routines in your upgrade path.)

It requires having the previous version of Postgres installed, something which can mess with a number of distro policies (not to mention docker, which is the most popular way to deploy software that will typically rely on Postgres), and unlike most software with that issue, Postgres is software you want to be managed by your distro.

Therefore, most people only upgrade by necessity - when their distro forces the upgrade or the version they're using reaches EOL.

replies(11): >>41877841 #>>41877921 #>>41877992 #>>41878101 #>>41878462 #>>41878670 #>>41879013 #>>41879161 #>>41879191 #>>41879259 #>>41879567 #
MichaelZuo ◴[] No.41877841[source]
So the real question is, why is the upgrade process so incompetently designed, and why has no one fixed this?
replies(5): >>41877898 #>>41877902 #>>41877926 #>>41878252 #>>41878442 #
phil21 ◴[] No.41877902[source]
My opinion is Postgres was designed by software developers for software developers. The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set, and the sysadmins having to support maintenance and production availability preferring MySQL.

One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level. Apparently us lowly sysadmins may have had a point after all.

This is a bit tongue in cheek but really not far from my lived reality. When the focus is on features and “correctness” at the near total expense of sane systems tooling folks can develop some myopia on the subject. So many arguments with devs on my teams over this subject that were utterly horrified to find we were running MySQL for a given service.

Open source projects tend to fix the pain points its contributors experience, and I assume there were not too many contributors wanting to deal with the boring work of making administration and easy task - it’s thankless “sideways” work that won’t result in many accolades or personal satisfaction for most SWEs.

The end users are almost always developers, most of whose experiences in production entail either the equivalent of a docker container level scale system, or are simply given a connection string and the rest is a black box to them. Under those contexts I’d personally prefer Postgres as well and it wouldn’t even be close. When you get into backups, clustering, upgrades, and high availability under extreme load? IMO the story falls apart real fast.

replies(5): >>41878214 #>>41878540 #>>41878590 #>>41878653 #>>41881690 #
sgarland ◴[] No.41878653[source]
> The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set

I’ve seen this as well, but when pressed, none of them could articulate what part of its feature set they actually needed to use.

> One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level.

Are you (or your devs, rather) actually running your own DBs? If so, respect. My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.

replies(3): >>41878860 #>>41879564 #>>41879577 #
1. dalyons ◴[] No.41878860[source]
> My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.

Yep that’s exactly what I’ve seen too :). I still overall prefer this distributed database model - yes you spend more and people make mistakes (and learn). But if you can afford it you get higher velocity and more incentive aligned ownership than the old central gate keeping DBA team model.

replies(2): >>41878987 #>>41883630 #
2. sgarland ◴[] No.41878987[source]
I don’t mind the model IFF the team has interest in learning how to do it correctly. My biggest complaint as both an SRE and now DBRE has been that dev-managed infrastructure inevitably means during an incident that I had nothing to do with, I’ll be paged to fix it anyway. Actually, that’s not the problem; the problem is later when I explain precisely how and why it broke, and how to avoid it in the future, there’s rarely any interest in doing so.

“You have an unpartitioned table with a few billion rows and a UUIDv4 PK. I’m amazed it’s been working as long as it has. You need to change your schema.”

“Or we can just buy a bigger instance.”

“…”

Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.

Re: incentives, yes, also a problem. Dev teams are generally incentivized by Product, who doesn’t care at all whether or not something is optimal, only that new features are being shipped. I despise this mentality, but it’s not usually the devs fault.

replies(2): >>41879210 #>>41879447 #
3. necovek ◴[] No.41879210[source]
UUID (version does not matter for storage, only for generation and distribution) is basically a 128-bit unsigned int, so a double "word" on 64-bit platforms, and it's natively supported by Postgres since at least 8.3 (earliest version with docs up).

While most versions ensure it's random, there are plenty of indexing algorithms that make searching through that quick and close to O(1), so that should not be the schema problem.

Unless you used a string field, but there is a quick workaround for that with Postgres too (make an index on `UUID(field)`, and look it up by `UUID(value) = UUID(field)`).

That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases: you can easily handle some small mistakes in schemas too while you do the full migration in parallel.

replies(1): >>41879419 #
4. sgarland ◴[] No.41879419{3}[source]
Who said this was Postgres? MySQL (with the default InnoDB engine) and MSSQL both are clustering indexes; they store tuples around the PK. For a UUIDv4 PK – or anything else non-k-sortable, for that matter – this results in a massive amount of B+tree bloat from the random inserts.

But sure, let’s talk about Postgres. After all, it stores tuples in a heap, and so is immune to this behavior.

Except that its MVCC implementation means that it has to maintain a Visibility Map [0] to keep track of which pages contain only tuples which are visible to every active transaction. This is nominally used for vacuum decisions, but is also cleverly exploited for use with index-only scans. This poses a problem when referencing non-k-sortable entries, because while the tuples are in a heap, the indices are generally a B+tree. So now we’re back to the problem of massive IO amplification. Even if the VM and index are entirely cached, reads are reads, and they add up.

Then there’s the issue of WAL bloat due to full page writes. tl;dr Postgres pages are nominally 8 KiB, and storage device pages are nominally 4 KiB. To guarantee an atomic write, Postgres writes the entire page for a given tuple for the first write after checkpoint, regardless of how many bytes were altered. Again, non-k-sortable: if your tuples are spread randomly across pages, you now have far more pages to write. This can and does matter for either write-heavy workloads, instances with limited network bandwidth (oh hi small RDS / Aurora), or the worst, both combined.

Re: search complexity, I’m pretty sure B+trees (and B-trees) have O(log n) time complexity, not O(1). Whether or not that’s “close” depends on n, I suppose, but in this scenario I specifically said “billions of rows.”

> That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases

I’m a DBRE, and like both MySQL and Postgres. They both have strengths and weaknesses, but you need to deeply understand those – and properly design your schema and query patterns around them – in order to make an informed decision.

[0]: https://www.postgresql.org/docs/current/storage-vm.html

replies(1): >>41883681 #
5. mschuster91 ◴[] No.41879447[source]
> Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.

Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools - Doctrine in the PHP world for example. And pretty much no one but actual CS graduates knows anything deeper about databases.

Result is, devs are happy enough if they found something that works, and don't want to risk being the one who broke prod because they applied some schema change suggested by their DBA who doesn't know some random thing about the application.

replies(2): >>41880131 #>>41883719 #
6. Tostino ◴[] No.41880131{3}[source]
You would be very surprised to see the workflow of DB heavy development teams. Some oracle devs have entire backends coded in pl/sql.

My last company had an absolute ton of pl/pgsql written to support hundreds of ELT pipelines, migrations were all hand written and managed with liquibase.

There are more of them than you'd think out there. Just generally supporting some boring b2b or backend software.

7. ttfkam ◴[] No.41883630[source]
> the old central gate keeping DBA team model

I have mixed feelings about this. On the one hand I agree that ownership should be shared. On the other, app developers really don't consider their data structures as carefully in SQL as they do in-memory. It's odd. The right data structure matters more than a good algorithm since algorithms are easier to change. Once you settle on a list vs a set vs a queue, you're stuck once code is built around it.

The same is doubly true for the database schema. Lack of planning and knowledge of expected access patterns can turn an otherwise fast database to mud in no time flat. Once your data is in there, changing the schema is exponentially harder.

"I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships." – Linus Torvalds (2006)

What is your database but a bunch of data structures and relationships? I get why the gatekeeping occurred. I don't agree with it, but I understand it. Far too many folks consider expertise in data stores to be optional as developers.

replies(2): >>41887467 #>>41888291 #
8. ttfkam ◴[] No.41883681{4}[source]
> For a UUIDv4 PK

To be fair, this blows out any db that supports clustered indexes as well. Non-k-sortable primary keys are just a bad idea all around.

With UUIDv7, the WAL write amplification problem goes away just as the clustered index issues do.

replies(1): >>41887450 #
9. ttfkam ◴[] No.41883719{3}[source]
> Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools

Your experience does not match mine. Tools like ORMs make horrible schemas in my opinion that cater to the lowest common denominator of SQL engine functionality. This means leaving a lot of performance and scalability on the floor. In order to make the ORMs generate decent schema definitions, you need to know the underlying engine and therefore SQL. At that point, you might as well use SQL.

Ever try changing a column's data type from a table with hundreds of millions of rows with an ORM definition file? Hope you like downtime.

10. sgarland ◴[] No.41887450{5}[source]
Agreed. My preferred PK in descending order is: natural key if it makes sense and would speed up queries, integer of an appropriate size, UUIDv7. I only rank it below integers because at best, they’re 16 bytes, and even a BIGINT is only 8 bytes.
replies(1): >>41905388 #
11. sgarland ◴[] No.41887467[source]
100% on all points. I’m a fan of gatekeeping things that are hard to get right, and hard to undo. If you can prove that you know what you’re doing, by all means, own your stuff. But until then, yes, I’d like to watch over your shoulder, and hopefully teach you how to do it right.

The fact that DBs are data structures but are ignored has always irritated me. If I pushed a PR using lists for everything regardless of their practicality, I’d be rightly told to try again. But push a PR with a schema using similar suboptimal choices, and no one blinks an eye.

12. dalyons ◴[] No.41888291[source]
I agree but also I think I could be clearer about the key advantage of the distributed many dbs model… it’s that I don’t have to care if people are good at anything that you said :). If teams want to do dumb things with their schema, fine, that’s on them and they’ll have to deal with the consequences. If it matters, they’ll learn and get better. If they want to buy their way out of it with their latency and infra budget instead, that’s honestly fine too.

With many smaller databases owned separately, the blast radius of bad db decisions is small/local.

13. ttfkam ◴[] No.41905388{6}[source]
Contrary to intuition, bigint ends up only 25% smaller on disk than UUIDv7 and no improvement in speed. Honestly in 2024, 32-bit ints are generally a poor fit. Either the table is small enough to be considered a lookup table (16-bit int) or could grow to unknown bounds given enough time (64-bit int). Sequential ids are guessable (which could be a security issue), cannot be generated on the client, and can hinder sharding now that Postgres supports bidirectional replication (multiple writers).

https://ardentperf.com/2024/02/03/uuid-benchmark-war/

My PK preference in descending order: natural key if it makes sense and would speed up queries, 16-bit for unambiguous lookup tables with a hard finite limit of values, UUIDv7, 64-bit sequence, UUIDv4, and then finally 32-bit sequence.

UUIDv7 for performance, reasonable non-guessability, client-generation, and potential for sharding. 64-bit for cases that really need a sequence. UUIDv4 when randomness is required for security. 32-bit when there's a hard finite limit that can guaranteed to be below 2 billion even given the most optimistic of projections.

If the database is small, it doesn't matter. If it's large, it is 99.999% unlikely the primary key will be anything more than a rounding error in storage, access time, or cost. No one ever said, "We would have made payroll if only we had used 32-bit primary keys."

To really save space, extract the timestamp in the UUIDv7 and use an expression index to use as the creation date. Then you're not "wasting" any storage space. Personally I don't think the storage optimization is necessary. Premature optimization being evil and all that.