Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
How software "feels" is subjective. Can you be more specific?
In no particular order, my preference for postgres is driven by:
* Date / time functions that don't suck
* UTF-8 is really UTF-8
* 99% of a backup can be done live with nothing more than rsyncing the data directory and the WAL files
* Really comprehensive documentation
* LTREE and fuzzy string match extensions
* Familiarity from using it for years
MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.
At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).
Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.
This is all documented very clearly on https://dev.mysql.com/doc/refman/8.4/en/implicit-commit.html. Hardly an "obstical course".
This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.
This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.
I think it boils down to this. We used to constantly be running the obsolete pg version until it became an emergency mostly because upgrading with the tooling available at the time was very painful. Today however, we stay relatively up to date. Once you figure out the data replication, you can almost do blue green deployments on databases with almost no down time.
Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.
To be clear, I like both. Postgres has a lot more features, and is far more extensible. But there’s no getting around the fact that its MVCC implementation means that at scale, you have to worry about things that simply do not exist for MySQL: vacuuming, txid wraparound, etc.
* comprehensive transaction model using different modes
* PostGIS and lots of other great extensions
* supports most of the current SQL standard and is clear on interpretation of edge-cases in the documentation
* support for writing stored procedures in any major programming language
* many useful functions regarding dates, sets, ranges, json, xml, ...
* custom datatypes
* extremely thought-out and standardized approach to development: if a feature is included it generally works well in interaction with everything else
* syntax, semantics and performance are all very predictable
* great documentation
Regarding MySQL / MariaDB: MySQL optimized for performance first. Until 2010 the standard-storage-engine MyISAM didn't even support transactions.
PostgreSQL always focused on correctness and stability and then made sure everything performed.
Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.
https://web.archive.org/web/20230922210124/https://grimoire....
In MySQL either `?` or `help` or `\?` will show you the help...
VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.
> even just being able to sort user-submitted strings up to a kilobyte
As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!
* I'm running a lot of MySQL stuff and such a topic might be of interest to me
That was true in 2012; dunno if it still applies though.
If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
Creating short numerical primary keys for long strings is a common database technique.
The fact that you still can't use DDL in transactions makes life exceedingly painful, but it's technically safe if you write your migration code carefully enough.
Postgres is definitely more difficult to administer.
Pretty sure that even in 2012 MySQL had very easy to use replication, which Postgres didn't have well into the late 2010s (does it today? It's been a while since I've ran any databases).
Possibly I got it wrong and switched around which was easier on the devs and which was easier on the sysads?
In my defence, ISTR, when talking to sysads about MySQL vs PostgreSQL, they preferred the latter due to having less to worry about once deployed (MySQL would apparently magically lose data sometimes).
No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.
Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.
I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.
I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.
MySQL was immortalized as the database in every LAMP stack. And just like PHP it improved considerably since then.
https://jepsen.io/analyses/mysql-8.0.34 ( MySQL 8.0.34 2023-12-19 )
HN: https://news.ycombinator.com/item?id=38695750
"MySQL is a popular relational database. We revisit Kleppmann’s 2014 Hermitage and confirm that MySQL’s Repeatable Read still allows G2-item, G-single, and lost update. Using our transaction consistency checker Elle, we show that MySQL Repeatable Read also violates internal consistency. Furthermore, it violates Monotonic Atomic View: transactions can observe some of another transaction’s effects, then later fail to observe other effects of that same transaction. We demonstrate violations of ANSI SQL’s requirements for Repeatable Read. We believe MySQL Repeatable Read is somewhat stronger than Read Committed. As a lagniappe, we show that AWS RDS MySQL clusters routinely violate Serializability."
---------------
https://jepsen.io/analyses/postgresql-12.3 ( PostgreSQL 12.3 2020-06-12 )
HN: https://news.ycombinator.com/item?id=23498781
"PostgreSQL is a widely-known relational database system. We evaluated PostgreSQL using Jepsen’s new transactional isolation checker Elle, and found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable. Under normal operation, transactions could occasionally exhibit G2-item: an anomaly involving a set of transactions which (roughly speaking) mutually fail to observe each other’s writes. In addition, we found frequent instances of G2-item under PostgreSQL “repeatable read”, which is explicitly proscribed by commonly-cited formalizations of repeatable read. As previously reported by Martin Kleppmann, this is due to the fact that PostgreSQL “repeatable read” is actually snapshot isolation. This behavior is allowable due to long-discussed ambiguities in the ANSI SQL standard, but could be surprising for users familiar with the literature. A patch for the bug we found in serializability is scheduled for the next minor release, on August 13th, and the presence of G2-item under repeatable read could be readily addressed through documentation."
Except that search doesn't work, because all text is in latin1 encoding.
And then there's UTF8 not actually being UTF8, which can result in total lockup of a table if someone inputs a character that does not fit in UCS-2 and now you need to recover the database from backup and preferably convert all instances of utf8 to utf8mb4, because fuck you that's why.
Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.
The OPTIMIZE works almost exclusively with online DDL statements. There's only a brief table lock held during table metadata operations, but I haven't found that to be a problem in practice. (https://dev.mysql.com/doc/refman/8.4/en/optimize-table.html#...)
But sure, it was easy to get a proof of concept working. But when you tried to break it by turning off network and/or machines, then shit broke down in very broken ways that was not recoverable. I'm guessing most that set up MySQL replication didn't actually verify that it worked well when SHTF.
EDIT: my mistake, OPTIMIZE TABLE is an online DDL. I’ve been burned in the past from foreign key constraint metadata locks essentially turning it into a blocking operation.
My only conclusion is that the majority of devs don’t actually read documentation, and rely purely on the last blog post they read to influence their infrastructure decisions.
[0]: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initializa...
(I joined MySQL shortly after that mess, before the Sun acquisition)
Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.
sometimes, repairing broken data is easier than, say, upgrading a god damn hot DB.
MVCC is overrated. Not every row in a busy MySQL table is your transactional wallet balance. But to upgrade a DB you have to deal with every field every row every table, and data keeps changing, which is a real headache
Fixing a range of broken data, however, can be done by a junior developer. If you rely on rdbms for a single source of truth you are probably fucked anyway.
btw I do hate DDL changes in MySQL.
This is objectively false. If this was true, indexes wouldn't serve range queries. You couldn't index on dates. You couldn't sort numbers.
> But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
This fails when you need to insert new values into the table. Then you not only need to figure out the new integer value (how, if you can't efficiently compare sorted string values???), you need to update all the integers to make room.
The lack of transactional DDL is a totally valid complaint, but the non-DDL stuff is just a total head-scratcher to me. Aside from DDL, implicit commits have literally never impacted me in my 21 years of using MySQL.
> This fails when you need to insert new values into the table.
Yes, that's part of the extra code you need to keep the values accurately sorted. There are a lot of different particular code solutions that might work -- whether allowing for collisions and re-ordering every night with a cron job, or putting large gaps between numbers, or using floats.
But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box. Indexes aren't meant for that and they won't work, and this is by design. You're going to have to roll your own code for that.
Fortunately I've never come across a case in the wild where maintaining a globally sorted list of long items was required (though I'm not saying they never exist). E.g. if you're building a spider that needs to match against URL's, you'd index a short hash of the URL as a non-unique index. Or if you wanted to display sorted URL's for a site, you'd index by domain name only, and then sort the remainder of the URL at query time.
In a competitive market where people make very long term engineering decisions based on stability and reliability you can’t fuck up this badly and survive.
You keep saying that, but Postgres does a great job with no issues without any extra work. MySQL is alone in being suboptimal. "It's not designed for that" isn't a good answer, if it works great. Show me how the underlying data structures fail or perform poorly if it's really not something you should do.
The difference between MySQL and Postgres here is negligible. It doesn't matter exactly where you define the limit of a short field, except it should probably be able to hold a maximum length filename which is 255 characters, plus some room to spare. Both MySQL and Postgres do this fine.
It's only suboptimal if you choose the wrong column type for the task at hand. For storing URLs, you almost certainly don't want collation behaviors, such as accent insensitivity or case insensitivity. So VARBINARY is a better choice here anyway.
And as several other commenters have mentioned, at large scale, indexing a bunch of long URLs in b-trees is indeed a bad practice performance-wise in any relational database. You won't be able to fit many entries per page, so read performance will be slow, especially for range scans.
In that situation it's almost always better to use a non-unique index over a prefix (if you need sorting and range scans) or a hash (if you don't), and disambiguate collisions by having the full value in an unindexed column. And/or split the URL up between the domain name and path in separate columns. If needed, normalize the domain names into a separate table so that the URL table can refer to them by numeric ID. etc. All depends on the specific use-case.
There are plenty of needs to store URLs which will never go through a browser.
You can only claim that "some URL use cases" can be stored in 2048 characters.
To do a cleanup, InnoDB uses the records it kept to delete old data, while Postgres must do a scan. So InnoDB pays a record-keeping price as part of the update that makes it easier to clear data, while Postgres decides to pay this price of occasional scanning.
https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic/s...
But to the point, people often use this point to claim that MySQL is a toy database, not usable for real world production use. I use Oracle as a counterpoint, which also has a lot of warts but is pretty much an archetype of an enterprise-grade DB engine.