Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.
At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.
It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.
How software "feels" is subjective. Can you be more specific?
In no particular order, my preference for postgres is driven by:
* Date / time functions that don't suck
* UTF-8 is really UTF-8
* 99% of a backup can be done live with nothing more than rsyncing the data directory and the WAL files
* Really comprehensive documentation
* LTREE and fuzzy string match extensions
* Familiarity from using it for years
MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.
At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.
That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.
For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).
Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.
This is all documented very clearly on https://dev.mysql.com/doc/refman/8.4/en/implicit-commit.html. Hardly an "obstical course".
This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.
This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.
Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.
* comprehensive transaction model using different modes
* PostGIS and lots of other great extensions
* supports most of the current SQL standard and is clear on interpretation of edge-cases in the documentation
* support for writing stored procedures in any major programming language
* many useful functions regarding dates, sets, ranges, json, xml, ...
* custom datatypes
* extremely thought-out and standardized approach to development: if a feature is included it generally works well in interaction with everything else
* syntax, semantics and performance are all very predictable
* great documentation
Regarding MySQL / MariaDB: MySQL optimized for performance first. Until 2010 the standard-storage-engine MyISAM didn't even support transactions.
PostgreSQL always focused on correctness and stability and then made sure everything performed.
Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.
In MySQL either `?` or `help` or `\?` will show you the help...
VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.
> even just being able to sort user-submitted strings up to a kilobyte
As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!
If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
Creating short numerical primary keys for long strings is a common database technique.
No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.
Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.
I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.
I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.
Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.
[0]: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initializa...
Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.
This is objectively false. If this was true, indexes wouldn't serve range queries. You couldn't index on dates. You couldn't sort numbers.
> But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.
This fails when you need to insert new values into the table. Then you not only need to figure out the new integer value (how, if you can't efficiently compare sorted string values???), you need to update all the integers to make room.
The lack of transactional DDL is a totally valid complaint, but the non-DDL stuff is just a total head-scratcher to me. Aside from DDL, implicit commits have literally never impacted me in my 21 years of using MySQL.
> This fails when you need to insert new values into the table.
Yes, that's part of the extra code you need to keep the values accurately sorted. There are a lot of different particular code solutions that might work -- whether allowing for collisions and re-ordering every night with a cron job, or putting large gaps between numbers, or using floats.
But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box. Indexes aren't meant for that and they won't work, and this is by design. You're going to have to roll your own code for that.
Fortunately I've never come across a case in the wild where maintaining a globally sorted list of long items was required (though I'm not saying they never exist). E.g. if you're building a spider that needs to match against URL's, you'd index a short hash of the URL as a non-unique index. Or if you wanted to display sorted URL's for a site, you'd index by domain name only, and then sort the remainder of the URL at query time.
In a competitive market where people make very long term engineering decisions based on stability and reliability you can’t fuck up this badly and survive.
You keep saying that, but Postgres does a great job with no issues without any extra work. MySQL is alone in being suboptimal. "It's not designed for that" isn't a good answer, if it works great. Show me how the underlying data structures fail or perform poorly if it's really not something you should do.
The difference between MySQL and Postgres here is negligible. It doesn't matter exactly where you define the limit of a short field, except it should probably be able to hold a maximum length filename which is 255 characters, plus some room to spare. Both MySQL and Postgres do this fine.
It's only suboptimal if you choose the wrong column type for the task at hand. For storing URLs, you almost certainly don't want collation behaviors, such as accent insensitivity or case insensitivity. So VARBINARY is a better choice here anyway.
And as several other commenters have mentioned, at large scale, indexing a bunch of long URLs in b-trees is indeed a bad practice performance-wise in any relational database. You won't be able to fit many entries per page, so read performance will be slow, especially for range scans.
In that situation it's almost always better to use a non-unique index over a prefix (if you need sorting and range scans) or a hash (if you don't), and disambiguate collisions by having the full value in an unindexed column. And/or split the URL up between the domain name and path in separate columns. If needed, normalize the domain names into a separate table so that the URL table can refer to them by numeric ID. etc. All depends on the specific use-case.
There are plenty of needs to store URLs which will never go through a browser.
You can only claim that "some URL use cases" can be stored in 2048 characters.