Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
postgres 1958 0.0 0.0 247616 26040 ? S Jul21 3:03 /usr/lib/postgresql/11/bin/postgres
postgres 1085195 0.0 0.0 249804 24740 ? Ss Aug19 2:01 /usr/lib/postgresql/13/bin/postgres
postgres 1085196 0.0 0.0 223240 27900 ? Ss Aug19 1:59 /usr/lib/postgresql/15/bin/postgres
Postgres is the only thing on my Debian that doesn't seamlessly automatically upgrade across dist-upgrades, but instead leaves old versions around for me to deal with manually... which I seem to never get around to.That way you can have multiple versions of the same package.
It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
No. It's been released in September 2024. That's not "quite a bit".
Now as to why people aren't all on 17 and not even on 16 yet, here's an acronym for you: LTS [1]
Debian 11 Bullseye is the current LTS. It came out in 2021.
You can check this article about Uber migrating its Mysql from v5 to v8 posted here 3 days ago [1]. Among other things, they observed a "~94% reduction in overall database lock time." The before/after graph is pretty impressive. It also gave them window functions and better JSON support, which are two very big features.
[1] https://www.uber.com/en-JO/blog/upgrading-ubers-mysql-fleet
How software "feels" is subjective. Can you be more specific?
My timid management forbade an upgrade from Oracle 7.3.4 until 2013. It was agony to remain on that museum piece for as long as we did.
I am upgrade-minded, but my management is not. I always lose.
I am retiring in two years. I will not miss their problems, not at all.
Edit: Oracle 10g was the last release that (for us) brought must-have features. Sure, upgrading to 19 or 23 would be great, but it doesn't bring anything that I really want.
Significant security vulnerability? Upgrade
Feature you need? Upgrade
All other reasons: Don't upgrade.
Upgrading takes effort and it is risky. The benefits must be worth the risks.
The classic example is advertising a new improved fishing reel in a fishing magazine. People buy the magazine (well, 20 years ago they did) because they want to know about things like new improved fishing reels.
It's a world away from the overwhelming avalanche of bullshit that is modern advertising/spam. There's nothing at all weird about hating advertising in general but being ok with permission marketing.
If you follow this idea further you'll find that very few people, even the most vocal, genuinely hate advertising. We all want to know about useful products and services. We just don't want to see a million ads a day for Apple, Coke, Pepsi, Nike, erectile dysfunction, fake single women in your area, Nigerian princes...
Because when it reaches a certain scale, and when too many psychological tricks are being played, and everything is always, BRIGHT, BIG, hyper-sexualized, when you can't walk down any street, watch anything, read anything, without seeing people richer, smarter, younger, sexier, happier than you, it goes far beyond just advertising. It's brainwashing. It has to stop because it's extremely unhealthy for our societies, our mental health, our children.
In no particular order, my preference for postgres is driven by:
* Date / time functions that don't suck
* UTF-8 is really UTF-8
* 99% of a backup can be done live with nothing more than rsyncing the data directory and the WAL files
* Really comprehensive documentation
* LTREE and fuzzy string match extensions
* Familiarity from using it for years
MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.The consequence is that things in database-land tends to move slower than other types of software. This I think is the major reason why we still use SQL.
Thank you Amazon!
I don't think this is done automatically when you simply install a new postgres version, but I'm not certain of that.
Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.
At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).
Debian has Stable. That's it.
Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.
This is all documented very clearly on https://dev.mysql.com/doc/refman/8.4/en/implicit-commit.html. Hardly an "obstical course".
There's no option where the nodes all accept writes.
This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.
This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.
But when I'm watching a YouTube video, having the video cut mid-sentence to some hyper-annoying and unrelated noisy ad simply angers me and makes me look for an ad-blocker.
I’ve been there with products that were still internal at the time. I can only imagine how much fun that is with a public product. But then I do have a pretty vivid imagination. We changed to periodic upgrades after that to avoid the obvious problem staring us in the face.
I think it boils down to this. We used to constantly be running the obsolete pg version until it became an emergency mostly because upgrading with the tooling available at the time was very painful. Today however, we stay relatively up to date. Once you figure out the data replication, you can almost do blue green deployments on databases with almost no down time.
The RDBMS has not fundamentally changed much since the 80's. It's the same basic design inside and out, with a shitload of tweaks and optimizations. Don't get me wrong - you can get very far with tweaks and optimizations. But the foundations root you to a particular model, and some things in that model will always be painful.
The important question to me isn't why don't people upgrade. It's why do people run Postgres?
Is there no other kind of database? No, there are hundreds of different databases. Is it because Postgres does things fundamentally different and better than anything else? No, lots of things work somewhat like Postgres, with pros and cons. Is it because you can't do things without Postgres? No, there are always alternatives that you can make work. Is it because it's impossible for it to keep working without upgrading? No, any database with long-term support could continue working without upgrading.
So why use Postgres at all? Answer: it's the incumbency, stupid.
When a big fucking honking giant "thing" sucks up all the air in the room, provides for most of the use cases, and is accepted by all the users, then it's extremely hard to justify not using it. Incumbents usually win, even if they're old, disreputable, buggy, and annoying. Even if they're missing some obvious features other things have. Even if people loudly complain that they want change. It's just more annoying not to use them.
We're used to them. We've adapted. Trying to do something else is going to be annoying and hard. So we stick to what we know. But that doesn't mean we have to stroke the balls while working the shaft. Upgrading isn't exactly fun. It doesn't even buy us anything, other than the promise of "support", or "compatibility" with whatever else will eventually require it. So we upgrade, eventually, when we must.
But the constant mind-numbing march away from entropy isn't a fantastic reason to keep using the thing. When you have to convince yourself to stick with it, it's likely you're in an toxic relationship. If you're honest with yourself, you'll agree that it's time for a change. You deserve better.
But switching is full of unknowns. Ominous. Time-consuming. Hard. You know you want to, maybe even need to. But there's too much to lose. So you stick to the familiar, to what you can handle. Maybe something better will come down the pike soon. You even tell yourself you're lucky to be here. But deep down you know you're just comforting yourself. You wish you had something else. Something that brings you joy. Something better. Something... right. But that something isn't here right now. So until Mr. Right gets here, you'll stick with Mr. Right Now.
Time to get ready for the next upgrade..... sigh...
Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.
To be clear, I like both. Postgres has a lot more features, and is far more extensible. But there’s no getting around the fact that its MVCC implementation means that at scale, you have to worry about things that simply do not exist for MySQL: vacuuming, txid wraparound, etc.
* comprehensive transaction model using different modes
* PostGIS and lots of other great extensions
* supports most of the current SQL standard and is clear on interpretation of edge-cases in the documentation
* support for writing stored procedures in any major programming language
* many useful functions regarding dates, sets, ranges, json, xml, ...
* custom datatypes
* extremely thought-out and standardized approach to development: if a feature is included it generally works well in interaction with everything else
* syntax, semantics and performance are all very predictable
* great documentation
Regarding MySQL / MariaDB: MySQL optimized for performance first. Until 2010 the standard-storage-engine MyISAM didn't even support transactions.
PostgreSQL always focused on correctness and stability and then made sure everything performed.
Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.
A dump/reload of the database or use of the pg_upgrade application is required for major upgrades.
Now, seems one can mostly use pg_upgrade, which only rewrites the system tables[2] so is fairly quick. But if on-disk format has changed it seems you're forced to dump and reload.
At work we've mainly been using SQLAnywhere, which would just disable new functionality for databases using old on-disk format. So upgrading major versions has usually been fairly painless and quick.
https://web.archive.org/web/20230922210124/https://grimoire....
In MySQL either `?` or `help` or `\?` will show you the help...
[0] Probably including those passwords you didn't hash, and those credit card numbers you shouldn't be storing in the first place because, what the heck, it meets your needs.
VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.
> even just being able to sort user-submitted strings up to a kilobyte
As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!
* I'm running a lot of MySQL stuff and such a topic might be of interest to me
Database engines (every single one) are notorious for incompatibilities between major versions, upgrading mission critical stuff means updating and re-testing entire applications, which in some cases can be a multi-million dollar process, before going into production.
Even if you deeply know/think that there's no problem upgrading, if something does fail in production after an upgrade and it's mission critical..
That was true in 2012; dunno if it still applies though.
If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
Creating short numerical primary keys for long strings is a common database technique.
It's been a long time since I worked with v9.x in anger, so I could well be forgetting things though
https://www.postgresql.org/docs/current/ddl-partitioning.htm...
Additionally, actions done frequently are less risky than actions done rarely, since you develop skills in performing that action as an organization - see high deployment frequency as a strategy of managing deployment risk.
This adds up to continuous upgrading being the least risky option in aggregate.
The fact that you still can't use DDL in transactions makes life exceedingly painful, but it's technically safe if you write your migration code carefully enough.
Postgres is definitely more difficult to administer.
I hope you have a backup somewhere else, not on the Mac Mini. One option would be to restore it on another machine on Postgres 14, follow the upgrade path to 17 (idk if you can jump directly but a few Google searches will cover this), verify the data, back up the new db, somehow upgrade Posgtres on your Mac, and restore the new back up. Done.
Hell, I have a hard time to tell the version of some system build-in binaries.
A few months ago, I have trouble to unzip a file which turns out ot be AES-encrypted. Some answers on SO [1] saying I should update my `unzip` to newer version but I can't find any updates for my distro, and I have no idea (still no, so feel free to teach me) to update it manually to make my `unzip` supporting AES. And all the versions, the good and the bad, all say they're "version 6.0.0" despite they behavior obviously differently.
[1] https://stackoverflow.com/questions/60674080/how-to-open-win...
Pretty sure that even in 2012 MySQL had very easy to use replication, which Postgres didn't have well into the late 2010s (does it today? It's been a while since I've ran any databases).
Possibly I got it wrong and switched around which was easier on the devs and which was easier on the sysads?
In my defence, ISTR, when talking to sysads about MySQL vs PostgreSQL, they preferred the latter due to having less to worry about once deployed (MySQL would apparently magically lose data sometimes).
1. Most psql deployments are not exposed to the interwebz, they are typically only accessible to the applications that need them by virtue of network setup (firewalls etc). This limits the attack vector to whatever the application does. Good.
2. Distro vendors (RHEL et al) often stick to major psql release for the lifecycle of the OS version. If the OS lives longer than the psql major version, they take on the responsability of backporting critical security issues.
3. While upgrades aren't hard, they're not easy either.
4. Psql is pretty much feature complete for many workloads, and pretty stable in general. For many people, there is little need to chase the latest major version.
No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.
If you haven't installed them via your programming language's package manager, you either installed them manually or via the OS package manager. The first one you'd know how to upgrade, and for the second you can ask it what version it is and what version is available to upgrade to (for compatibility reasons it might not be the latest, or latest major, unless you use the software vendor's own package manager repositories).
It's actually much easier than in Windows, because you have a piece of software (package manager) that is your one stop shop to knowing what is installed, update it, check versions available, etc. unless you've manually installed stuff.
In Windows you... google and download random .exes? Cool. As good as the worst possible option on Linux.
> As good as the worst possible option on Linux.
I understand this is not a fair comparison, but in practice, they're not as easy. When using Windows, I usually use it with a proper GUI interface, so popping up a browser and download the newest installer for a software from their official website would take me less than 1 min.
Doing similar for my Linux VPS with only a terminal is much more complicated.
Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.
I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.
I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.
MySQL was immortalized as the database in every LAMP stack. And just like PHP it improved considerably since then.
The replication and high availability that mariadb offers is rock solid, and much more advanced than what postgresql has to offer. It works out of the box, no plugins or third party shenanigans needed, and there is "one obvious way" to do it. Not a dozen options like in the postgresql ecosystem, each with it's own quirks and drawbacks.
Also, upgrades are a dream. I did 4 major long term stable release upgrades so far and everything was fully automatic and smooth sailing all around.
All of that with about 10 seconds downtime only for each upgrade, despite being a chunky 6TB database, thanks to the user friendly replication process.
I respect postgresql a lot, but mariadb allows me to sleep much better at night thanks to replication, smooth upgrades and no VACUUM to worry about.
I guess it depends on scale? I was surprised how easy it was on Ubuntu. There was an automatic migration script, and it worked. Took less than 5 minutes to upgrade.
Sure, there was downtime, but I think most applications out there can live with scheduled downtime of a few minutes.
If you can't have downtime, then nothing is easy.
https://jepsen.io/analyses/mysql-8.0.34 ( MySQL 8.0.34 2023-12-19 )
HN: https://news.ycombinator.com/item?id=38695750
"MySQL is a popular relational database. We revisit Kleppmann’s 2014 Hermitage and confirm that MySQL’s Repeatable Read still allows G2-item, G-single, and lost update. Using our transaction consistency checker Elle, we show that MySQL Repeatable Read also violates internal consistency. Furthermore, it violates Monotonic Atomic View: transactions can observe some of another transaction’s effects, then later fail to observe other effects of that same transaction. We demonstrate violations of ANSI SQL’s requirements for Repeatable Read. We believe MySQL Repeatable Read is somewhat stronger than Read Committed. As a lagniappe, we show that AWS RDS MySQL clusters routinely violate Serializability."
---------------
https://jepsen.io/analyses/postgresql-12.3 ( PostgreSQL 12.3 2020-06-12 )
HN: https://news.ycombinator.com/item?id=23498781
"PostgreSQL is a widely-known relational database system. We evaluated PostgreSQL using Jepsen’s new transactional isolation checker Elle, and found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable. Under normal operation, transactions could occasionally exhibit G2-item: an anomaly involving a set of transactions which (roughly speaking) mutually fail to observe each other’s writes. In addition, we found frequent instances of G2-item under PostgreSQL “repeatable read”, which is explicitly proscribed by commonly-cited formalizations of repeatable read. As previously reported by Martin Kleppmann, this is due to the fact that PostgreSQL “repeatable read” is actually snapshot isolation. This behavior is allowable due to long-discussed ambiguities in the ANSI SQL standard, but could be surprising for users familiar with the literature. A patch for the bug we found in serializability is scheduled for the next minor release, on August 13th, and the presence of G2-item under repeatable read could be readily addressed through documentation."
Except that search doesn't work, because all text is in latin1 encoding.
And then there's UTF8 not actually being UTF8, which can result in total lockup of a table if someone inputs a character that does not fit in UCS-2 and now you need to recover the database from backup and preferably convert all instances of utf8 to utf8mb4, because fuck you that's why.
And in my experience, apart from ease of use there's also a major trust issue here. If you're upgrading your app server framework/language, it's easy enough to do a rollback. With databases, people are worried that they might not notice errors right away and then you have to merge the data accumulated since the upgrade with the last backup in case of a rollback.
Not saying that this is entirely rational...
Also, new features on the SQL level are hard to sell if all you're doing is lowest common denominator ORM ("New window functions and faster lateral joins? But we're doing all that in our code!").
To drive this a little further, "latest and greatest" doesn't always apply. I've chosen software - even databases - for greenfield deployments one or two releases behind for their known characteristics.
Stability doesn't imply perfection, but rather, predictability.
Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.
I say this as someone who absolutely loves using it, but the actual process of upgrading Postgres is something that takes significant downtime, is error-prone and you're often better off just dumping all the databases and reimporting them in a new folder. (A good idea in general since it'll vacuum and compact indexes as well if you do it, combining a few maintenance routines in your upgrade path.)
It requires having the previous version of Postgres installed, something which can mess with a number of distro policies (not to mention docker, which is the most popular way to deploy software that will typically rely on Postgres), and unlike most software with that issue, Postgres is software you want to be managed by your distro.
Therefore, most people only upgrade by necessity - when their distro forces the upgrade or the version they're using reaches EOL.
To others reading, be mindful: database upgrade times depend greatly on the data stored within them/where/who is using them. Your development environment doesn't represent production. If the distinction even exists!
A five minute upgrade can become indefinite with a single lock [row/table, depending on the storage engine/etc]
Let me check what I'm on... brb... Postgres 14.
Because it's not automatic I leave it, I leave it until it's so unsupported that I must upgrade the whole system, then I build a new system with a new Postgres and I migrate the old to the new.
I want, so badly, for Postgres to just automatically update itself, that a new binary just works with the data directory of an old version, and that if required it does an in-place upgrade to those data files when it can (i.e. if it can detect the last version was the same major as the current version, upgrade the files transparently to the admin).
My databases are all backed up each night, and these are single server Postgres with no replication or other trickery, an automatic upgrade for a single-server Postgres should be possible.
As it's not done... I assume (incorrectly?) that there be dragons and risks, and I mitigate that by never upgrading, just waiting and migrating. Migrating puts all of the risk on me, human error, and I am definitely fallible, so I can best handle this risk by just not doing it until I must.
Last migration I performed was from Postgres 7 > Postgres 14 in October 2021... I guess I have quite a few years of Postgres 14 ahead of me still. I would take downtime, my systems aren't zero downtime critical, just give me a super simple one-command no-questions upgrade as I really dislike migrations.
Suppose you join a project and see that the PostgreSQL version used is pretty old.
Do you:
A) convince people to migrate it for some nebulous benefits (features that aren't currently used, performance improvements that will be difficult to measure in lieu of tooling and aggregated statistics) while also taking on the risks of breaking everything or even worse, data loss (since you won't always be able to provision multiple instances and backup restore might take a while and you could still mess that up)
B) or just leave it as it is and focus on anything else
Many will prefer to not get burned even if there shouldn't be that many risks with upgrading your average PostgreSQL install, which is why you'll get minor/patch releases as best, alongside whatever the runtime environment is getting upgraded.Containers and bind mounts make all of this way easier, but then again, many places don't use containers.
If I run my application/code v1 right now, I generate data. I expect that if I move to application/code v2, I can leave my data in place and it will automatically apply changes to my data.
I do not get that with postgres. If I am on postgres 16, and I want to upgrade to postgres 17, I want to leave my data folder untouched. When I then start postgres 17, it should just work (tm).
It should also work with code that assumes postgres 16, so I can upgrade my database separate from my application. I can not wait 10 days for a large database to be migrated from 16 to 17 without being able to run it. However, I can wait 10 days before updating my code to support features in 17.
The current upgrade process does not give me such confidence in restoring data and uptime. So I don't upgrade until I really have to.
The OPTIMIZE works almost exclusively with online DDL statements. There's only a brief table lock held during table metadata operations, but I haven't found that to be a problem in practice. (https://dev.mysql.com/doc/refman/8.4/en/optimize-table.html#...)
By analogy: should a database cluster optimize consistency or availability? Answer: there’s no way its author can guess which is more important to how you want to use it.
Debian/Ubuntu:
sudo apt update
sudo apt upgrade
Fedora/RHEL: sudo dnf update
Arch: sudo pacman -Syu
Alpine Linux: apk update
apk add --upgrade apk-tools
apk upgrade --available
Of course, if the package you need isn't available in the standard repos, then you'll need to look elsewhere (e.g. PPAs or third party repos). There's also options like Flatpak and AppImage if you want something that's a bit closer to how you'd choose to install new releases on Windows.If I wanted to update all of the installed software I have on my Windows install, there'd basically be no way for me to do this, outside of shady update manager software.
At the same time, I get the appeal of being able to just download a new release and install it, both AppImage on Linux and the way you install software on macOS (just drag the file into Applications) seem similarly pleasant to me in that regard.
To expand on the latter (the response got deleted), you can very much do something like https://peazip.github.io/peazip-linux.html (I wish 7-Zip was available on Linux natively, but as far as GUI software goes, PeaZip is pretty nice) however that's not the most common approach. You should generally prefer using the package manager when you can.
For this reason, I chose to upgrade to version 11 because it was only a couple of versions apart and still had repositories available at the time.
So the first thing I do is stop the VM and take a snapshot. Then I start it back up and go check for database corruption before I dump them... wait there's no utility to check for corruption...? Yep that's right! You basically have to YOLO the whole thing and hope it works. OK...
So I dump the databases and back up the directory. I shut down the old version and then install the new version from the repo. I start to import the databases and notice in the scrolling logs that there's some incompatibility... Oh F*$&. I google the error and spend a good hour trying to figure it out. Apparently there are external plugins for postgres that were installed in the old version. I search for the plugins online and they are long discontinued. OK, so let's just copy them over from the old version. I stop postgres, copy them over, and start it back up. It starts up ok. I reimport the databases and no more errors. Yay! I start the applicationsand pray to the SQL gods. So far so good, everything seems to work.
Thankfully the applications tested well and worked post upgrade.
All of this was done for a critical multi-million dollar healthcare platform. There were no official guides. Nothing. I had to find some random admin's blog for guidance. MySQL on the other hand has full documentation on just about every aspect of everything. The whole process was super hackish and not having any way to check database integrity would have been a show stopper for me had I designed this configuration.
I meant to say when you can't find the software you want in package manager, it's easier to download it manually and install it on Windows than (again, unfair comparison) a terminal-only Linux server.
On average, the experience of upgrading/managing packages is obviously much better than Windows.
I meant to say in certain cases (like the `unzip` example I mentioned above), when the system's build-in package manager fails, I seem to not be able to find alternatives like what I did on Windows (just find the piece of binary I want and manually install it). I to this day still can't find a way to update `unzip` to a version that supports AES on my Debian VPS.
Old software is not necessarily broken, but it is always a tech debt. And you can't live in debt forever, our IT does not work this way.
My homelab projects mostly center around a "everything is an MQTT message" idea. Zigbee sensors, Tasmota power readings, OwnTracks locations, surveillance camera events, motion sensors for light switches, currently active app on my PC, status of my 3D printer, whatever my vacuum robots are up to and so on. It all gets recorded into a Postgres db. From there I can use it for data mining experiments, but mostly as a source for Grafana. I tried counting the rows but that query didn't even complete while I was writing this comment.
I like trying out all kinds of dockerized oss services, and I keep them updated using watchtower. I run a gitlab instance which is usually the most annoying service to update because it there's an upgrade path and post-start-migrations. With my Postgres instance, which is isolated from the internet, I'll have to figure out what the fastest way is to move all that data around, not leave a huge gap in the record and so on. Sounds like at least a day of work - and since it's technically all for "fun", it'll have to wait until it actually is that.
One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level. Apparently us lowly sysadmins may have had a point after all.
This is a bit tongue in cheek but really not far from my lived reality. When the focus is on features and “correctness” at the near total expense of sane systems tooling folks can develop some myopia on the subject. So many arguments with devs on my teams over this subject that were utterly horrified to find we were running MySQL for a given service.
Open source projects tend to fix the pain points its contributors experience, and I assume there were not too many contributors wanting to deal with the boring work of making administration and easy task - it’s thankless “sideways” work that won’t result in many accolades or personal satisfaction for most SWEs.
The end users are almost always developers, most of whose experiences in production entail either the equivalent of a docker container level scale system, or are simply given a connection string and the rest is a black box to them. Under those contexts I’d personally prefer Postgres as well and it wouldn’t even be close. When you get into backups, clustering, upgrades, and high availability under extreme load? IMO the story falls apart real fast.
I recommend installing PG12 on a temporary VM, duplicating the existing database, and test the upgrade in isolation.
https://www.postgresql.org/docs/current/pgupgrade.html
A more complicated approach uses replication, and upgrades the standby server before promoting it to the primary server.
I have suspected for a long time that most people who criticize MySQL have never actually worked with it, or have done so a couple of decades ago. It's often the default choice if you don't need some of the PostgreSQL features (like PostGIS) and can work with either one.
Why?
For production systems I generally try to avoid using anything not built internally. When it comes to PostgreSQL, for example, if you want to submit benchmarks, or you want to report bugs, an important or sometimes a mandatory component is to provide compilation flags. Also, seeing how every kind of storage can act in a different way, tuning of higher-level storage programs seems inevitable. Even though PostgreSQL has a lot of configurations that can be done to the compiled program, some (eg. block size) are compile-time that can only be changed by compiling the program.
It requires a lot of work, planned downtime, or some way to smear updates across the estate.
The cost of any failure is very high. The benefit of any major upgrade is also vanishingly small. Unless you need a specific feature, its just not worth it.
But sure, it was easy to get a proof of concept working. But when you tried to break it by turning off network and/or machines, then shit broke down in very broken ways that was not recoverable. I'm guessing most that set up MySQL replication didn't actually verify that it worked well when SHTF.
Amongst other reasons:
- Performing regular upgrades is an easy and cheap way to maintain a healthy knowledge base about the system. It's always easier to fix a system that is well understood than a black box which nobody has touched in 20 years. Upgrading regularly also leads to people being more comfortable with the upgrade process and it is likely to become a "regular thing" with checklists and other process improvements that make maintenance safer and more reliable.
- Doing preemptive maintenance means YOU get to choose when the system is unavailable and then you can plan accordingly, perhaps by planning it during a period with low usage or even arranging for a redundant system to take up the load. The alternative is leaving it up to chance when a system will fail, and that's frequently at high-load moments when you can least afford it.
- As a corollary to the previous point: a team which is in control of its systems can have more predictable output of feature work, since there will be fewer "unexpected" events.
Not maintaining your systems is only cheaper in the long run if you don't count the engineering time required to fix things when they break or need to be replaced. Ounce of prevention vs pound of cure and all that.
But obviously this is not suitable for all use cases. I don't know why pg doesn't invest in this sector.
That doesn't mean "upgrade recklessly," but it does mean you should know _why_ you're either upgrading or _NOT_ upgrading. That's your job, much more than the act of upgrading itself.
Unpublished vulnerabilities in old software are not a hypothetical. And very old packages are usually broken, just coped with at the expense of significant lost opportunity cost - or because the failure is a combination of rare and impactful that means once it happens everyone is out of job anyway.
Seriously, I've yet have to encounter a sysadmin using that old, silly adage at me and not later have to admit I was right.
Edit: so no, you don't stay on an ancient version of the database because "it's not broken." You're staying on it because _the upgrade process itself_ is so broken you're terrified of it.
I mean I think it's because maintenance is so unglamorous. So when it happens and everything doesn't collapse, nobody remembers and starts asking why we even do it (and then sysadmins and middle management suddenly aren't a thing, and companies and IT become exactly as fragile as anyone with _actual_ understanding of management would expect).
Meanwhile when regular maintenance fails in progress, it often ends up in the news with a body count attached.
One of my favourite podcasts has a running joke that you should never do maintenance (because so many industrial disasters happen during it). Of course the bias is the point of the joke - and usually the reason things went bad is because either the maintenance was neglected until that point, or the engineering picked Postgresql, I mean, didn't consider the necessity of maintenance.
So even if it's available these days, the amount of people still subjected to upgrades from older version still leaves an impression that it's really bad.
In that case you would just copy the download link and paste it into your terminal session. It's rarely needed though as most software is available through your distribution's software repositories.
Maybe because there is none? I quickly googled and found this bug: https://bugs.launchpad.net/ubuntu/+source/unzip/+bug/220654
For archives encrypted with aes-256 p7unzip-full can be used.
This is not a Linux only issue though, the native Windows unzip tool also doesn't seem to support aes-256 (yet): https://answers.microsoft.com/en-us/windows/forum/all/how-do...
The author in this answer clearly has a version of unzip that can detect "AES_WG". Unfortunately they only vaguely said (in one of the comment) "Since then the main Linux distros have added patches to fix various issues" and didn't specify which distro.
So does using the vendor repos:
* https://apt.postgresql.org/ / https://yum.postgresql.org
* https://dev.mysql.com/downloads/repo/
This way when you upgrade your OS you don't have to worry about suddenly getting a new version (which, in the case of MySQL, may take a long time to convert its database files to the new version format).
The PostgreSQL team simply does not have the resources to do this. At least not without significantly slowing down development of everything else which there is no political will for. Maybe someone will come up with a genius idea which solves this but I am doubtful. Usually there is no free lunch.
Maybe some core dev will correct me, I am quite familiar with the PostgreSQL project but not one of the core devs. :)
This is the one thing in My/MariaDB that I miss in Pg: Galera. So handy for in-house stuff that needs some kind of HA: either a three-server setup, or a two-server+arbitrator.
PostgreSQL has prioritized correctness and stability which while it has made certain features more clunky to use or taken longer time for them to be implemented, when you get paged n the middle of the night you get the time you spent back. PostgreSQL also has spent a lot of time on improving DBA experience so mixed DBA/sysadmins also usually prefer PG.
Looking at https://mariadb.com/kb/en/standard-replication/, is the one obvious way binlog replication or Galera?
That way you can update the database version when you want, while still installing the base system updates quite frequently. Plus, I think it’s really nice to separate the runtime from the persistent data, which such setups make obvious.
- whatever made it past the postgres 10 hump is on the most recent version and keeps getting migrated
- whatever is still on 9.x is scheduled to remain that way until the product using it will be EOL'd
I’ve seen this as well, but when pressed, none of them could articulate what part of its feature set they actually needed to use.
> One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level.
Are you (or your devs, rather) actually running your own DBs? If so, respect. My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.
If you don't want to use pg_upgrade, you can dump from one cluster and pipe directly into another, with no need for a temporary file. It couldn't be easier.
Can't say the same for Oracle. Even applying a patch can ruin several weekends.
EDIT: my mistake, OPTIMIZE TABLE is an online DDL. I’ve been burned in the past from foreign key constraint metadata locks essentially turning it into a blocking operation.
My only conclusion is that the majority of devs don’t actually read documentation, and rely purely on the last blog post they read to influence their infrastructure decisions.
Yep that’s exactly what I’ve seen too :). I still overall prefer this distributed database model - yes you spend more and people make mistakes (and learn). But if you can afford it you get higher velocity and more incentive aligned ownership than the old central gate keeping DBA team model.
[0]: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initializa...
(I joined MySQL shortly after that mess, before the Sun acquisition)
If you have a trivial database, yes.
If you have a large, complicated database that takes hours/days to restore and has an HA SLA, very hell no.
“You have an unpartitioned table with a few billion rows and a UUIDv4 PK. I’m amazed it’s been working as long as it has. You need to change your schema.”
“Or we can just buy a bigger instance.”
“…”
Rinse and repeat six months later. I’m aware this is an organizational problem, but from what I’ve seen, it’s endemic.
Re: incentives, yes, also a problem. Dev teams are generally incentivized by Product, who doesn’t care at all whether or not something is optimal, only that new features are being shipped. I despise this mentality, but it’s not usually the devs fault.
* Upgrades
* Database Protocol
MS SQL Server does better on both of these fronts: Upgrades, when running MS SQL Server in a container, is rebooting the database with the new database engine version. The DBMS sees the old files, upgrades them in place and starts the DB completely seamlessly 100% of the time.
The MS SQL Server protocol, TDS, supports native named parameters, executes the T-SQL PL inline with SQL. TDS also supports inline query cancellations because the protocol is framed, and both T-SQL and TDS supports returning multiple result sets from a single query.
But yes, whenever I see PG docs saying, major versions require an export and re-load, I shudder.
No matter how easy it is, it takes more effort and thought to do an upgrade than it does to not do an upgrade at all, and for most users the upside is basically invisible if it exists at all.
Any needed updates to databases are done by the new exe, automatically. Obviously backup everything before you do the upgrade. I've never had one fail.
(Granted, we have rather small DBs and simple applications where ON UPDATE SKIP LOCKED is about the most fancy feature we use.)
While most versions ensure it's random, there are plenty of indexing algorithms that make searching through that quick and close to O(1), so that should not be the schema problem.
Unless you used a string field, but there is a quick workaround for that with Postgres too (make an index on `UUID(field)`, and look it up by `UUID(value) = UUID(field)`).
That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases: you can easily handle some small mistakes in schemas too while you do the full migration in parallel.
At least on Ubuntu (and likely Debian), your existing Postgres version is never dropped until you are ready to manually upgrade yourself.
1. The new version of PostgreSQL would need to able to read all old catalog table formats and migrate them.
2. The new version of PostgreSQL would need to support all old versions of the parse tree to migrate views.
3. Likely a bunch of more things that I do not know of. I for example doubt it is trivial to just read an old catalog without having a fully up and running cluster which supports almost everything in that old cluster. The catalog has TOAST tables and indexes for example.
Right now 1 and 2 are implemented in pg_dump plus by having pg_dump call functions in a running old version of PostgreSQL.
Every bit of the experience before getting it working, feels like it was designed by an MBA who owns a Postgres hosting service.
Other than that, it's amazing.
But sure, let’s talk about Postgres. After all, it stores tuples in a heap, and so is immune to this behavior.
Except that its MVCC implementation means that it has to maintain a Visibility Map [0] to keep track of which pages contain only tuples which are visible to every active transaction. This is nominally used for vacuum decisions, but is also cleverly exploited for use with index-only scans. This poses a problem when referencing non-k-sortable entries, because while the tuples are in a heap, the indices are generally a B+tree. So now we’re back to the problem of massive IO amplification. Even if the VM and index are entirely cached, reads are reads, and they add up.
Then there’s the issue of WAL bloat due to full page writes. tl;dr Postgres pages are nominally 8 KiB, and storage device pages are nominally 4 KiB. To guarantee an atomic write, Postgres writes the entire page for a given tuple for the first write after checkpoint, regardless of how many bytes were altered. Again, non-k-sortable: if your tuples are spread randomly across pages, you now have far more pages to write. This can and does matter for either write-heavy workloads, instances with limited network bandwidth (oh hi small RDS / Aurora), or the worst, both combined.
Re: search complexity, I’m pretty sure B+trees (and B-trees) have O(log n) time complexity, not O(1). Whether or not that’s “close” depends on n, I suppose, but in this scenario I specifically said “billions of rows.”
> That's why both "devops" (DBAs?) and "devs" prefer Postgres over many other databases
I’m a DBRE, and like both MySQL and Postgres. They both have strengths and weaknesses, but you need to deeply understand those – and properly design your schema and query patterns around them – in order to make an informed decision.
[0]: https://www.postgresql.org/docs/current/storage-vm.html
Most of the software on my machines are "old" because they are part of a Linux distribution that (aside from security issues) was frozen in time a year or two ago so that it could be tested, released, and maintained. I am quite happy to have a system that I know is not going to break (either itself, or my workflow) when I apply security updates.
People who MUST HAVE the latest version of everything I feel either have some deeper FOMO issues to work out, suffer from boredom, or look at their computers as hobbies themselves rather than tools. (Which is fine, just be honest about what it is.)
That said, much of my career has been spent working at companies who got so busy shipping features that upgrading infrastructure never makes it above the fold. You can tell the managers that working around old software adds costs that scale with the age of the infrastructure, but they don't always listen. I currently work at a company that still has loads of CentOS 7 hosts still in production, and only fairly recently began upgrading them to RHEL 8. (Not 9!)
Easy enough: almost no one writes SQL queries by hand these days, not for querying the database nor for doing schema upgrades. It's all done by tools - Doctrine in the PHP world for example. And pretty much no one but actual CS graduates knows anything deeper about databases.
Result is, devs are happy enough if they found something that works, and don't want to risk being the one who broke prod because they applied some schema change suggested by their DBA who doesn't know some random thing about the application.
We are constantly making changes to our schemas (adding tables, columns, etc). Its never an issue on physical standby's, it just gets created, but logical replication, we have to manually run the changes on the subscriber.
We have lots of instances where we create a new table for a feature coming, and alter another to add a column.
If those get missed on the logical subscriber, you don't even know until someone tries to write data to that new table or new column.
I know logical is supposed to be flexible, but I wish there was a setting to have a replica via logical, so I can handle upgrades easier.
“Show me the incentive, I’ll show you the outcome”
Transactional DDL: migration errors never leave the database in an intermediate/inconsistent state.
Range types + exclusion constraint: just no way to do this in MySQL without introducing a race condition.
Writeable CTEs: creating insert/update/delete pipelines over multiple tables deterministically. Seriously though, the RETURNING clause is something I use all the time both in and out of CTEs.
Filtered aggregates and grouping sets: cleanly get multiple data points for a dashboard in one shot.
Unnest: converting arrays into a set of rows. Inverse of array_agg(...).
Types: arrays, booleans, IP/subnets, UUIDs (without binary(16) hacks), etc.
Materialized views: seriously, how does MySQL not have this yet?
Statement-level triggers: another option from per-row.
Row-level security: setting data visibility based on configurable policies.
I can cite specific use cases I've deployed to production for each of these and more.
1. Leave your old database running
2. Start your new database with new version
3. Run logical replication and let it catch up
4. Decide you want to switch over.
5. Using a script: Stop your apps writing to the database (or switch to read only mode), let logical replication catch up one final time, stop logical replication, run a procedure to fix your sequences, and switch your apps over to the new database.
You are now done. Sure, it requires a one-time investment to work out the exact steps needed for your situation, but it's easy to do with minimal downtime.
Of course if you don't care about downtime just shut down your database server, make a backup(!), and then do pg_upgrade.
That's a true shame considering how easy it is to make a read replica of any size and then fail over to it as the new primary. Definite skill issues.
I keep an eye out for vulnerabilities, and that's about it.
It works with basically no downtime but I agree this kind of thing should definitely be easier, even turnkey.
…or you're on AWS RDS, which will automatically bump your db cluster if it goes EOL and you ignore the notices for more than a year.
Or do you mean the new Pg server should transparently do the upgrade automatically? And while online?
Not chasing shiny is important but generally when tech debt builds up this high, life is generally hell in terms of outages, unable to accomplish basic tasks and dealing with a bunch of people who have NIH syndrome.
1. Set up logical replication to a new database server. We used https://github.com/2ndQuadrant/pglogical, but maybe you don't need that any more with newer versions of postgres?
2. Flip a feature flag that pauses all database queries and wait for the queue of queries to complete.
3. Wait for the query queue to drain and for replication to catch up.
4. Flip a feature flag that switches the connection from the old db to the new db.
5. Flip the flag to resume queries.
It helped that we were written in OCaml. We had to write our own connection pooling, which meant that we had full control over the query queue. Not sure how you would do it with e.g. Java's Hikari, where the query queue and the connection settings are complected.
We also had no long-running queries, with a default timeout of 30 seconds.
It helped to over-provision servers during the migration, because any requests that came in while the migration was ongoing would have to wait for the migration to complete.
Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.
JSON can often be used in place of arrays, and JSON_TABLE in both MySQL and MariaDB converts JSON into tabular data. MySQL supports multi-valued indexes over JSON, where each row can have multiple index entries (or no entries, e.g. partial index).
MariaDB has built-in convenience types for ipv4, ipv6, and uuid. Or in MySQL you can just use virtual columns to add human-readable conversions of binary columns, although that is admittedly slightly annoying.
MariaDB supports RETURNING.
[1] Edit to add: I do mean that honestly, it's an accurate and insightful list of nice Postgres features, most of which aren't in MySQL or MariaDB. Honestly baffled as to why I'm being downvoted.
Absolutely...not.
Slow does not mean stable. Slow means the floor is rotting out from under you constantly.
Being prudent about when and where to upgrade is a very active, intentional process that the typical company simply don't have the stomach or skill for.
Once the database runs you are inclined to let it keep on running until you must do an upgrade (Bugfixes, EOS) . Upgrading for new features/ improved performance is nice but can be a pain and is prone to cause downtime. PostgreSQLs upgrade tolling story is also a bit painful but it has proven to be an awesome way to semi-regularly test my backup infrastructure.
I also really miss their query optimizer. It was just so damn smart.
Whether you go from v14 to v17 or v14 to v20, I doubt it'll make a difference in migration strategy. You've still got a fair amount of time before EOL. No need to stress or FOMO.
My last company had an absolute ton of pl/pgsql written to support hundreds of ELT pipelines, migrations were all hand written and managed with liquibase.
There are more of them than you'd think out there. Just generally supporting some boring b2b or backend software.
In that post we walk through all the steps we took to go from Postgres 11.9 to 15.3.
However, I agree with OP that SQL Server does have a solid upgrade story and it's generally pain free.
They are just two aspects I really wish Postgres could do better on, and I miss working with those parts of SQL Server.
My experience has been that most people (including devs) hardly think about their database at all. Everyone just takes the database for granted until a dev runs a bad query that locks a table. Devs let their ORM manage everything for them and don't take the time to think for themselves.
Also I rarely see teams with a culture that prioritizes maintenance, even for their own software. It's common for teams to become wary of upgrading because they do it infrequently. Like most things, the more often you perform an upgrade the easier it is. Smaller changes are easier to revert. The more often you do something, the better at it you get. Delaying maintenance often just increases the work later.
I think this is the right answer for 99% of companies. Yes, everyone wants zero downtime, but I see incredibly few cases where it is actually worth the cost and effort.
https://en.wikipedia.org/wiki/G._K._Chesterton#Chesterton's_...
sometimes, repairing broken data is easier than, say, upgrading a god damn hot DB.
MVCC is overrated. Not every row in a busy MySQL table is your transactional wallet balance. But to upgrade a DB you have to deal with every field every row every table, and data keeps changing, which is a real headache
Fixing a range of broken data, however, can be done by a junior developer. If you rely on rdbms for a single source of truth you are probably fucked anyway.
btw I do hate DDL changes in MySQL.
On all open-source distros, you can look into your distro's source and check their compile flags.
I like pointing it at an S3 bucket so I have another full backup repo (with its own retention strategy) away from my homelab, as well as to my local NAS.
This is objectively false. If this was true, indexes wouldn't serve range queries. You couldn't index on dates. You couldn't sort numbers.
> But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
This fails when you need to insert new values into the table. Then you not only need to figure out the new integer value (how, if you can't efficiently compare sorted string values???), you need to update all the integers to make room.
Did a restore to a stage environment, worked on my upgrade scripts until I was happy (deployed to VMs with ansible, so manual work to write the upgradeprocessfor me), restored again and ran the upgrade process fresh, and then tested my application, backup scripts, restores, etc. Had everything working entirely smoothly multiple times before pulling the trigger in production.
No stress at all when we did it in prod.
The lack of transactional DDL is a totally valid complaint, but the non-DDL stuff is just a total head-scratcher to me. Aside from DDL, implicit commits have literally never impacted me in my 21 years of using MySQL.
This is like storing UUIDs as text. You lose type information and validation. It's like storing your array as a comma-delimited string. It can work in a pinch, but it takes up more storage space and is far more error prone.
> convenience types for ipv4, ipv6, and uuid.
That's nice to see. A shame you have to decide ahead of time whether you're storing v6 or v4, and I don't see support for network ranges, but a definite improvement.
> MariaDB supports RETURNING.
That's honestly wonderful to see. Can these be used inside of CTEs as well for correlated INSERTs?
The app or service continues to run without disruption, and the backup is a coherent database image. Perhaps you have other state in addition to the database, or multiple databses. For example cache files. This covers them too, if they are all in the same snapshot.
There are many ways to take a snapshot: ZFS and btrfs offer a filesystem command, and any Linux filesystem, such as ext4, can be snapshotted with LVM or LVM-thin. Well known cloud providers like AWS, GCP, Azure also provide ways to snapshot block devices, through their APIs. However, to ensure a coherent image, it may be necessary to use the Linux `fsfreeze` command around API calls.
The database backup files can have incomplete transactions, but if the files are restored it will be as if the OS was abruptly stopped at the moment of the snapshot. SQLite and other good databases are designed to recover well from this sort of abrupt stop, without corrupting the database. They clean up incomplete transactions on recovery.
It doesn't yet automatically upgrade people's PG extensions, but that's on the ToDo list and has initial code in a PR. So that'll likely start happening in a few weeks too. :)
> This fails when you need to insert new values into the table.
Yes, that's part of the extra code you need to keep the values accurately sorted. There are a lot of different particular code solutions that might work -- whether allowing for collisions and re-ordering every night with a cron job, or putting large gaps between numbers, or using floats.
But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box. Indexes aren't meant for that and they won't work, and this is by design. You're going to have to roll your own code for that.
Fortunately I've never come across a case in the wild where maintaining a globally sorted list of long items was required (though I'm not saying they never exist). E.g. if you're building a spider that needs to match against URL's, you'd index a short hash of the URL as a non-unique index. Or if you wanted to display sorted URL's for a site, you'd index by domain name only, and then sort the remainder of the URL at query time.
Granted, using validated JSON is more hoops than having an array type directly. But in a pinch it's totally doable.
MySQL also stores JSON values using a binary representation, it's not a comma-separated string.
Alternatively, in some cases it may also be fine to pack an array of multi-byte ints into a VARBINARY. Or for an array of floats, MySQL 9 now has a VECTOR type.
Regarding ipv6 addresses: MariaDB's inet6 type can also store ipv4 values as well, although it can be inefficient in terms of storage. (inet6 values take up a fixed 16 bytes, regardless of whether the value is an ipv4 or ipv6 address.)
As for using RETURNING inside a writable CTE in MariaDB: not sure, I'd assume probably not. I must admit I'm not familiar with the multi-table pipeline write pattern that you're describing.
For more complex deployments, cases of backwards-incompatibilities in the data structure, or where I had reason to rebuild many large tables, you set up a replication secondary, do pre-upgrade migrations (e.g. in the case of deprecated functionality), upgrade the secondary, do post-upgrade migrations (e.g. in the case of new functionality), and wait for replication to catch up 100%. Then, after sending traffic to it for a bit to ensure it works correctly, you swap the primary and secondary and do it again.
I've seen a few rare cases where an update to the client library is required for improvements to the protocol, e.g. when MySQL switched to the new password/authentication format, but that was also backwards-compatible: the old format was used until an account's password was changed, password changes could be told to use the legacy format instead, and you could set the legacy format by default. The only issue that occurred (and it did occur) was old client libraries + new server password hashes, but, again, in this case it didn't affect existing installations/users/passwords.
1. Sandstorm apps need to always come back cleanly from a raw SIGKILL at any moment.
2. Sandstorm apps must be able to always upgrade ANY previous version of stored data, with no intervention, automatically, when newer versions are deployed.
These are tough technical constraints and have huge implications on the whole design. For example, the client/protocol layer needs to possibly be multi-version aware for clean rollouts, client connection failover needs to be built in, etc. But the reality is if you do not have these two constraints, your software will never really work in a turnkey forget-about-it-way way where someone just runs an instance and then actually forgets about it, and it can be safe and secure. This kind of stuff is imperative to every actual computer user who isn't a programming nerd.
This is why slapping Docker on a lot of existing pieces of software like databases doesn't really work in the grand scheme. Yes, it gets you started faster, which people value to a high degree. But it often lacks or has the same problems as other solutions on day 2, day 3, ... etc. So, you need to solve all the same problems anyway just in a different form (or externalize them onto some other solution.)
Realistically, the solutions have to be designed in from the ground up. That's very difficult to do but necessary.
Someone else mentioned in here that SQL Server always 100% works when upgraded in place from an old version. That's also my (limited, homelab-ish) experience and a good reason to like it.
Compare that with Postgres, which seemed very "old school", going so far as to assume it was the only thing a given server was doing. Connecting to postgres authenticated as your own user; creating a user was done with the `createuser` command (or similar, I don't remember what it was actually called), and not some namespaced `pg_createuser` command that would make it clear what it did.
I also remember setting up MySQL replication with almost no effort whatsoever, and then in the same year trying to set up Postgres replication - which it didn't have. I was told by other postgres admins to "just set up a script to rsync the database over and over to the other server; then if your first server dies just start the second server up and it'll recover". This seemed like a wildly cavalier attitude towards uptime and reliability, not to mention generating a ridiculous amount of I/O and network traffic for minimal benefit.
Ironically, Sandstorm itself is still stuck running on Mongo version 2.6 (a decade old!) because Mongo can't necessarily be updated in an unattended way. Of course, we could never ask every Sandstorm user to do some manual process to upgrade it.
Meanwhile, newer versions of the Mongo Node.js client library don't support such old Mongo, which means Sandstorm cannot actually update that package nor any of the packages that depend on it anymore. And this is why Sandstorm is now stuck in time and no longer receiving updates. :(
It was clearly a big mistake for Sandstorm to use Mongo. We chose it because we built the UI on Meteor, which was really only designed to work wing Mongo. In retrospect I wish we'd used SQLite, whose backwards-compatibility guarantee has now reached twenty years.
and
> Not maintaining your systems is only cheaper in the long run if you don't count the engineering time required to fix things when they break or need to be replaced. Ounce of prevention vs pound of cure and all that.
Those are pithy assertions, but I don't think they're universally factually correct. This applies at some scales, but not at others. One size does not fit all, nor does one maintenance strategy fit all. I do believe I specifically called out the dogmatism in this discussion, and yet here we go piling on more of the same. I don't think that's terribly helpful. I assume that everyone is well aware of all the points you raised, as I suspect that most people running PostgreSQL are not in fact incompetent.
Not everyone has a team which "is in control of its systems", nor can everyone afford such a team. It may well be cheaper for some people to run something old until it burns only to have The Database Consultant come in one day to raise it from the dead.
Most devs I’ve worked with don’t know about aggregations beyond COUNT and GROUP BY, and do everything in the app. I’ve pointed these out before, and am always told, “we don’t want to have additional logic in the DB.” So you want a dumb bit box then, got it – why are you using an RDBMS?
> Transactional DDL
I know this is a thing, and I’ve used it in Postgres, but I’ve also never found myself in MySQL being upset that I didn’t have it. Everything should be thoroughly tested in staging before prod.
> RLS
Yes, amazing feature if you use it correctly.
I will give MySQL a couple of nods in its favor: well, three.
1. Clustering index. If you design your schema around this fact, range queries can be WAY faster. Dropbox does this, IIRC.
2. Generated Virtual Columns. Not sure why Postgres still doesn’t support this.
3. ON UPDATE CURRENT_TIMESTAMP. It’s so nice to have the DB automatically track update times for you, IMO.
I once was examining some queries being generated via Prisma, and found it was using LIMIT/OFFSET for pagination. I pointed this out to the devs, who replied that the query times were acceptable for their SLOs. I guess if you don’t know that a simple SELECT can and should often be in the sub-msec range, you might not be concerned that it’s taking 100 msec.
The other is just the normalization of cloud expenditure. Every large org has some kind of pricing agreement with their provider, and so are rarely incentivized to cut costs, since they have a minimum spend.
I get why it wasn't, but logical replication has been production ready for a while now, so it really should have at least a little sugar.
In a competitive market where people make very long term engineering decisions based on stability and reliability you can’t fuck up this badly and survive.
When all you know is an ORM, you tend to treat SQL databases like dumb bit bucket add-ons to your app server. It's amazing how much potential performance and scalability are left on the floor because app developers can't shift their mindset when needed. Objects/structs cannot be assumed to map 1:1 with relations. What a world we'd live in if devs spent even 1/10 the effort examining their relational schema design that they spend arguing over whether a set, a list, or a queue is better for a given situation. It's like thoughts on Big-O stop cold at the database driver interface.
1. Yep, I definitely miss clustering indexes in Postgres sometimes. I can sometimes fake it with covering indexes when all I want are an extra column or two along with the primary key or similar without seeking to the main table, but you're right about that MySQL/MariaDB win here.
2. The dynamic computed column is an easy workaround with immutable functions that take the record as a param.
CREATE TABLE foo ( a int, b int, c int );
CREATE FUNCTION d(entry foo) RETURNS int LANGUAGE sql IMMUTABLE AS $$
SELECT foo.a + foo.b + foo.c;
$$;
SELECT a, b, c, d(foo) FROM foo;
It's not part of the table schema when doing a SELECT *, but it is just as efficient as a computed column in MySQL/MariaDB and only slightly more verbose.3. ON UPDATE CURRENT_TIMESTAMP works in Postgres with a trigger function, which you can reuse if all your tables use the same name for your "last_modified" column (probably a good idea anyway). Not as convenient as the declarative syntax, but it's a fairly trivial workaround.
CREATE OR REPLACE FUNCTION update_last_modified() RETURNS TRIGGER AS $$
BEGIN
NEW.last_modified = now();
RETURN NEW;
END;
$$ language 'plpgsql';
CREATE TRIGGER foo_last_modified BEFORE UPDATE ON foo
FOR EACH ROW EXECUTE PROCEDURE update_last_modified();
CREATE TRIGGER bar_last_modified BEFORE UPDATE ON bar
FOR EACH ROW EXECUTE PROCEDURE update_last_modified();
One function, many triggers. You also get to choose between "when transaction started" (now() or CURRENT_TIMESTAMP), "when statement started" (statement_timestamp()), or "right now" (clock_timestamp()).I don't mind workarounds so much as functionality that simply cannot be replicated. For example I miss real temporal table support in Postgres like what you can find in MariaDB or MS SQL Server. The painful kludges for missing PIVOT support like in MS SQL Server is another one.
You never know how much you need deferred foreign key constraints until you don't have them anymore. Or a materialized view.
WITH new_order AS (
INSERT INTO order (po_number, bill_to, ship_to)
VALUES ('ABCD1234', 42, 64)
RETURNING order_id
)
INSERT INTO order_item (order_id, product_id, quantity)
SELECT new_order.order_id, vals.product_id, vals.quantity
FROM (VALUES (10, 1), (11, 5), (12, 3)) AS vals(product_id, quantity)
CROSS JOIN new_order
;
Not super pretty, but it illustrates the point. A single statement that creates an order, gets its autogenerated id (bigint, uuid, whatever), and applies that id to the order items that follow. No network round trip necessary to get the order id before you add the items, which translates into a shorter duration for the transaction to remain open.I have mixed feelings about this. On the one hand I agree that ownership should be shared. On the other, app developers really don't consider their data structures as carefully in SQL as they do in-memory. It's odd. The right data structure matters more than a good algorithm since algorithms are easier to change. Once you settle on a list vs a set vs a queue, you're stuck once code is built around it.
The same is doubly true for the database schema. Lack of planning and knowledge of expected access patterns can turn an otherwise fast database to mud in no time flat. Once your data is in there, changing the schema is exponentially harder.
"I’m a huge proponent of designing your code around the data, rather than the other way around, and I think it’s one of the reasons git has been fairly successful… I will, in fact, claim that the difference between a bad programmer and a good one is whether he considers his code or his data structures more important. Bad programmers worry about the code. Good programmers worry about data structures and their relationships." – Linus Torvalds (2006)
What is your database but a bunch of data structures and relationships? I get why the gatekeeping occurred. I don't agree with it, but I understand it. Far too many folks consider expertise in data stores to be optional as developers.
To be fair, this blows out any db that supports clustered indexes as well. Non-k-sortable primary keys are just a bad idea all around.
With UUIDv7, the WAL write amplification problem goes away just as the clustered index issues do.
Your experience does not match mine. Tools like ORMs make horrible schemas in my opinion that cater to the lowest common denominator of SQL engine functionality. This means leaving a lot of performance and scalability on the floor. In order to make the ORMs generate decent schema definitions, you need to know the underlying engine and therefore SQL. At that point, you might as well use SQL.
Ever try changing a column's data type from a table with hundreds of millions of rows with an ORM definition file? Hope you like downtime.
You keep saying that, but Postgres does a great job with no issues without any extra work. MySQL is alone in being suboptimal. "It's not designed for that" isn't a good answer, if it works great. Show me how the underlying data structures fail or perform poorly if it's really not something you should do.
In this specific situation, the most common MySQL/MariaDB pattern would be to use LAST_INSERT_ID() in the second INSERT, assuming the order IDs are auto-increments. Or with UUIDs, simply generating the ID prior to the first INSERT, either on the application side or in a database-side session variable.
To avoid extra network calls, this could be wrapped in a stored proc, although a fair complaint is that MySQL doesn't support a ton of different programming langauges for procs/funcs like Postgres.
In terms of dev flow, this is only a problem with imperative migration systems. Declarative schema management tools solve it by being able to transition any live database state into the desired state, which is expressed by a repo of CREATE statements.
If something fails, you fix the bad CREATE and run the tool again, and it effectively picks up where it left off. And well-designed declarative tools catch many problems pre-flight anyway by running a suite of linters, running the emitted SQL statements in a sandbox first, etc.
If the tool's diff returns clean, you know your schema is in the right state on the DB.
Ironically, lack of transactional DDL actually makes declarative schema management more straightforward in MySQL/MariaDB: you can't mix DDL and DML there anyway, so it's more natural to handle schema changes vs data migrations using different tools/pipelines.
The difference between MySQL and Postgres here is negligible. It doesn't matter exactly where you define the limit of a short field, except it should probably be able to hold a maximum length filename which is 255 characters, plus some room to spare. Both MySQL and Postgres do this fine.
It's only suboptimal if you choose the wrong column type for the task at hand. For storing URLs, you almost certainly don't want collation behaviors, such as accent insensitivity or case insensitivity. So VARBINARY is a better choice here anyway.
And as several other commenters have mentioned, at large scale, indexing a bunch of long URLs in b-trees is indeed a bad practice performance-wise in any relational database. You won't be able to fit many entries per page, so read performance will be slow, especially for range scans.
In that situation it's almost always better to use a non-unique index over a prefix (if you need sorting and range scans) or a hash (if you don't), and disambiguate collisions by having the full value in an unindexed column. And/or split the URL up between the domain name and path in separate columns. If needed, normalize the domain names into a separate table so that the URL table can refer to them by numeric ID. etc. All depends on the specific use-case.
https://devcenter.heroku.com/articles/heroku-postgres-mainte...
There are plenty of needs to store URLs which will never go through a browser.
You can only claim that "some URL use cases" can be stored in 2048 characters.
Because now it was easy to use hah.
We ended up rolling it back because the hate was so consistent.
(And no, this wasn’t us guessing or making this up - a number of them were quite explicit that we were threatening their roles, and to stop it.)
To do a cleanup, InnoDB uses the records it kept to delete old data, while Postgres must do a scan. So InnoDB pays a record-keeping price as part of the update that makes it easier to clear data, while Postgres decides to pay this price of occasional scanning.
Upgrading the database server is as simple as shutting down the service, installing new executables and restarting the service. Downtime is measured in minutes.
When upgrading major versions, one might not get full functionality of the new features unless one upgrades the on-disk format (unload-reload), but you can run the new server version with older on-disk format just fine.
The fact that DBs are data structures but are ignored has always irritated me. If I pushed a PR using lists for everything regardless of their practicality, I’d be rightly told to try again. But push a PR with a schema using similar suboptimal choices, and no one blinks an eye.
https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic/s...
With many smaller databases owned separately, the blast radius of bad db decisions is small/local.
Trying to setup a logical replica of a much larger, high write volume database is an entirely different story with its own set of challenges. In some cases it’s not even possible to do even with tricks like dropping and restoring indexes.
Logical still struggles to keep up with high write loads. When something like vacuum freezes kicks off it’s not uncommon to see logical replication lag for significant periods.
Then there are things like lack of DDL replication. While this can be mostly worked around - it adds complexity. And remember DB user management is DDL - so if you change a DB password on the primary it won’t replicate to the logical replica.
When you have CDC systems using logical decoding from the systems you are upgrading you have to deal with resync conditions related to the fact that the logical replication slot will not be replicated and the new replica will lose its place in the replication stream.
Most non-trivial instances have multiple streaming replicas for read offloads which need to be coordinated at cutover. While not difficult it increases the complexity.
In addition - there are no guarantees of application compatibility. While this is rarely an issue in my experience- PG 14 (IIRC) changed the function signatures of a bunch of array functions which was particularly painful.
That said - Postgres is improving the story here. PG 17’s ability to convert a streaming replica to a logical one and be able to upgrade it will be a game changer for standing up the logical replica. If they can get DDL replication into logical it will improve things even more.
That said - it’s still way behind how modern databases like CockroachDB and Yugabyte handle things here.
But to the point, people often use this point to claim that MySQL is a toy database, not usable for real world production use. I use Oracle as a counterpoint, which also has a lot of warts but is pretty much an archetype of an enterprise-grade DB engine.
So, by "ancient", this author means possibly not even two years old, as Postgres 15 is barely that old (https://www.postgresql.org/about/news/postgresql-15-released...).
Meanwhile, I've had to make posts this year to tell people no, seriously, stop using Python 2.7.
And meanwhile in my package repo:
$ apt show postgresql
Package: postgresql
Version: 14+238
Priority: optional
Section: database
Source: postgresql-common (238)
Origin: Ubuntu
Maintainer: Ubuntu Developers <ubuntu-devel-discuss@lists.ubuntu.com>
This is Mint 21.3 btw.The author's expectations are incredibly unrealistic.
UUIDs are fixed-length. Blobs are not the appropriate type for that.
Take fore example the aforementioned block size. Say, you have a storage with 4x block size of the one used by PostgreSQL. If you run a benchmark over such a storage, you'll have insane write amplification. Nobody will consider results of such a "benchmark" because that's simply a misconfiguration of the program you are trying to measure.
More generally, Linux distros will tend to compile distributed binaries with "safe" defaults s.t. run on most h/w users can have, and this means optimizing for the lowest common denominator. Looping back to PostgreSQL, the default for block size was for a long time 4k, and iirc today it's 8k. This is fine, if we are talking about plain SSD / HDD, but with enterprise SDS, these are "rookie numbers", even the maximum supported by PosgreSQL (32k) is still a "rookie number", but it's still four times better than the default!
Don't forget to do this on Friday afternoon! :D
Upgrading OS in production environment... this reminds me of a joke about police academy intake test:
The participants were offered a board with a round and a square hole and two pegs of similar shape. After the test the new recruits were sorted into two groups: very smart and very strong
So, yeah, you ought to be either very smart or very strong to upgrade OS in production environment, and then also discover changes to your database as you go along.
That's not to say that upgrades don't happen at all... but you'd be upgrading in a testing environment ten times before you try that "for real", and at that point you would have probably solved the question of whether you need to (re)install the database and how :)
It says it works "without the data dump/restore" and...
> Major PostgreSQL releases regularly add new features that often change the layout of the system tables, but the internal data storage format rarely changes. pg_upgrade uses this fact to perform rapid upgrades by creating new system tables and simply reusing the old user data files
Regardless, I suppose it is that reliance on the unchanging of the internal data format which is limiting what refactors can do.
https://ardentperf.com/2024/02/03/uuid-benchmark-war/
My PK preference in descending order: natural key if it makes sense and would speed up queries, 16-bit for unambiguous lookup tables with a hard finite limit of values, UUIDv7, 64-bit sequence, UUIDv4, and then finally 32-bit sequence.
UUIDv7 for performance, reasonable non-guessability, client-generation, and potential for sharding. 64-bit for cases that really need a sequence. UUIDv4 when randomness is required for security. 32-bit when there's a hard finite limit that can guaranteed to be below 2 billion even given the most optimistic of projections.
If the database is small, it doesn't matter. If it's large, it is 99.999% unlikely the primary key will be anything more than a rounding error in storage, access time, or cost. No one ever said, "We would have made payroll if only we had used 32-bit primary keys."
To really save space, extract the timestamp in the UUIDv7 and use an expression index to use as the creation date. Then you're not "wasting" any storage space. Personally I don't think the storage optimization is necessary. Premature optimization being evil and all that.