Most active commenters

sgarland(10)
bastawhiz(7)
evanelias(7)
fhdsgbbcaA(6)
The_Colonel(4)
yxhuvud(4)
paulryanrogers(4)
crazygringo(3)
p_l(3)
Symbiote(3)

Popular/hot comments

>>41875126 #
>>41876111 #
>>41876232 #
>>41875276 #
>>41875286 #
>>41875336 #
>>41875306 #
>>41875240 #
>>41875168 #
>>41876259 #
>>41876309 #
>>41876650 #
>>41877061 #
>>41877270 #
>>41878057 #

←back to thread

Why does everyone run ancient Postgres versions?

(neon.tech)

1. paulryanrogers ◴[17 Oct 24 23:59 UTC] No.41875055[source]▶

>>41873957 (OP) #

Upgrades are hard. There was no replication in the before times. The original block-level replication didn't work among different major versions. Slony was a painful workaround based on triggers that amplified writes.

Newer PostgreSQL versions are better. Yet still not quite as robust or easy as MySQL.

At a certain scale even MySQL upgrades can be painful. At least when you cannot spare more than a few minutes of downtime.

replies(7): >>41875126 #>>41876174 #>>41876232 #>>41876375 #>>41877029 #>>41877268 #>>41877959 #

2. api ◴[18 Oct 24 00:10 UTC] No.41875126[source]▶

>>41875055 (TP) #

I've always wondered why Postgres is so insanely popular. I mean it has some nice things like very powerful support for a very comprehensive subset of SQL functionality, but most apps don't need all that.

It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.

replies(9): >>41875168 #>>41875240 #>>41875306 #>>41875587 #>>41876346 #>>41876770 #>>41877119 #>>41877951 #>>41878394 #

3. stickfigure ◴[18 Oct 24 00:19 UTC] No.41875168[source]▶

>>41875126 #

What's the alternative? MySQL? No transactional DDL, immediate fail.

replies(3): >>41875276 #>>41875729 #>>41876552 #

4. justin_oaks ◴[18 Oct 24 00:31 UTC] No.41875240[source]▶

>>41875126 #

> It really feels like early 1990s vintage Unix software. It's clunky and arcane and it's hard to feel confident doing anything complex with it.

How software "feels" is subjective. Can you be more specific?

replies(3): >>41875286 #>>41875625 #>>41877469 #

5. jes5199 ◴[18 Oct 24 00:37 UTC] No.41875276{3}[source]▶

>>41875168 #

I worked for a company that migrated from mysql to postgres, but then got big enough they wanted to hire fulltime database experts and ended up migrating back to mysql because it was easier to find talent

replies(4): >>41875436 #>>41875533 #>>41875636 #>>41879900 #

6. threeseed ◴[18 Oct 24 00:39 UTC] No.41875286{3}[source]▶

>>41875240 #

The command line experience is old school style i.e. to show tables.

  \c database
  \dt

Versus:

  use database
  show tables

replies(4): >>41875328 #>>41875350 #>>41875573 #>>41875689 #

7. tpmoney ◴[18 Oct 24 00:42 UTC] No.41875306[source]▶

>>41875126 #

> I've always wondered why Postgres is so insanely popular.

In no particular order, my preference for postgres is driven by:

  * Date / time functions that don't suck
  * UTF-8 is really UTF-8
  * 99% of a backup can be done live with nothing more than rsyncing the data directory and the WAL files
  * Really comprehensive documentation
  * LTREE and fuzzy string match extensions
  * Familiarity from using it for years

MySQL/Maria I'm sure is fine, but it's one of hose things where it's just different enough and I haven't encountered a compelling use case for changing my preference.

replies(3): >>41875336 #>>41876296 #>>41879736 #

8. dventimi ◴[18 Oct 24 00:47 UTC] No.41875328{4}[source]▶

>>41875286 #

That's psql.

9. fhdsgbbcaA ◴[18 Oct 24 00:49 UTC] No.41875336{3}[source]▶

>>41875306 #

UTF-8 is what made me switch. It’s insane MySQL has something called UTF-8 that isn't really UTF-8, but do have a type UTF8MB4 that actually is correct. This means if you use UFT-8 in MySQL, you can’t use emoji for example.

replies(3): >>41875430 #>>41876285 #>>41878032 #

10. fhdsgbbcaA ◴[18 Oct 24 00:50 UTC] No.41875350{4}[source]▶

>>41875286 #

It’s also faster to type.

replies(1): >>41876111 #

11. bastawhiz ◴[18 Oct 24 01:01 UTC] No.41875430{4}[source]▶

>>41875336 #

And the fact that adding real utf-8 support limited (limits?) the length of strings that can be indexed

replies(1): >>41875764 #

12. bastawhiz ◴[18 Oct 24 01:02 UTC] No.41875436{4}[source]▶

>>41875276 #

Dunno if that says much about Postgres, but it says a lot about the company

13. ◴[18 Oct 24 01:21 UTC] No.41875533{4}[source]▶

>>41875276 #

14. rootusrootus ◴[18 Oct 24 01:31 UTC] No.41875573{4}[source]▶

>>41875286 #

I assume this is really what it comes down to. If psql added those verbose-but-descriptive commands a whole bunch of people comfortable with mysql would be a lot happier using postgres.

15. DonHopkins ◴[18 Oct 24 01:33 UTC] No.41875587[source]▶

>>41875126 #

Because it's not tainted and cursed by Oracle, like MySQL (and Oracle).

replies(1): >>41877266 #

16. dalyons ◴[18 Oct 24 01:41 UTC] No.41875625{3}[source]▶

>>41875240 #

It requires a ton of somewhat arcane maintenance at scale. Vacuum shenanigans, Index fragmentation requiring manual reindexing, Txid wraparounds. I like Postgres but it’s definitely way more work to maintain a large instance than mysql. MySQL just kinda works

17. georgyo ◴[18 Oct 24 01:53 UTC] No.41875689{4}[source]▶

>>41875286 #

I started with MySQL in 2006 for my personal projects, but what first won me over to psql was those commands.

Today I use CLIs like usql to interact with MySQL and SQLite so I can continue to use those commands.

At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.

replies(1): >>41876427 #

18. cosmotic ◴[18 Oct 24 01:59 UTC] No.41875729{3}[source]▶

>>41875168 #

It's not just DDL that isn't transactional, there's a whole bunch of other things that aren't. And they break the transactionality silently. It's like an obstical course where bumping into something might be fatal.

replies(1): >>41875779 #

19. evanelias ◴[18 Oct 24 02:05 UTC] No.41875764{5}[source]▶

>>41875430 #

Postgres limits btree keys to 2704 bytes, which is actually slightly smaller than MySQL's limit of 3072 bytes, assuming the default InnoDB storage engine.

That said, when using utf8mb4 in an index key, MySQL uses the "worst case" of each character being 4 bytes. So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.

For practical purposes, this doesn't cause much pain, as it's generally inadvisable to use complete long-ish strings as a key. And there are various workarounds, like using prefixes or hashes as the key, or using binary strings as keys to get the full 3072 bytes (if you don't need collation behaviors).

replies(1): >>41875901 #

20. evanelias ◴[18 Oct 24 02:07 UTC] No.41875779{4}[source]▶

>>41875729 #

What specific non-DDL things are you referring to here?

Aside from DDL, the only other major ones are manipulating users/grants, manipulating replication, a small number of other administrative commands, and LOCK TABLES.

This is all documented very clearly on https://dev.mysql.com/doc/refman/8.4/en/implicit-commit.html. Hardly an "obstical course".

replies(1): >>41880682 #

21. bastawhiz ◴[18 Oct 24 02:36 UTC] No.41875901{6}[source]▶

>>41875764 #

> So it effectively limits the max key size to 3072/4 = 768 characters, when a column is using the utf8mb4 character set.

This is exactly what I mean. 768 characters for an index is woefully bad. And for no obviously great reason: you can just index the encoded UTF-8 text.

This was literally reason why a former company (who will remain nameless) refused to add Unicode support. It's not even an imagined problem.

replies(2): >>41876001 #>>41876347 #

22. evanelias ◴[18 Oct 24 03:01 UTC] No.41876001{7}[source]▶

>>41875901 #

Why did you need to index fairly long strings in their entirety in a way that preserves collation behaviors?

And why is a 768 character limit woefully bad, but a 2704 character limit is totally fine?

replies(2): >>41876259 #>>41876450 #

23. eYrKEC2 ◴[18 Oct 24 03:30 UTC] No.41876111{5}[source]▶

>>41875350 #

Not after you have to google, "What's the equivalent of `show tables` in postgres?", because the psql command names are completely arbitrary.

replies(5): >>41876166 #>>41877357 #>>41878020 #>>41878561 #>>41884086 #

24. fhdsgbbcaA ◴[18 Oct 24 03:44 UTC] No.41876166{6}[source]▶

>>41876111 #

Which you need to do exactly once.

replies(1): >>41876478 #

25. darth_avocado ◴[18 Oct 24 03:46 UTC] No.41876174[source]▶

>>41875055 (TP) #

> At least when you cannot spare more than a few minutes of downtime.

I think it boils down to this. We used to constantly be running the obsolete pg version until it became an emergency mostly because upgrading with the tooling available at the time was very painful. Today however, we stay relatively up to date. Once you figure out the data replication, you can almost do blue green deployments on databases with almost no down time.

26. slotrans ◴[18 Oct 24 04:00 UTC] No.41876232[source]▶

>>41875055 (TP) #

"Not as robust as MySQL"? Surely you're joking.

replies(4): >>41876309 #>>41876384 #>>41877139 #>>41927974 #

27. bastawhiz ◴[18 Oct 24 04:08 UTC] No.41876259{8}[source]▶

>>41876001 #

A URL, for instance, can't be safely stored in 768 characters, but it can be stored safely in 2704. If you then wanted to sort those URLs so that all URLs for each domain and path within that domain are adjacent, you need an index. Especially if you want to paginate over them with a cursor. Doing that without an index on the raw value is a royal pain in the ass.

Hell, even just being able to sort user-submitted strings up to a kilobyte. Why up to a kilobyte? Some users have strings that are kind of long. If I have to define a second column that's the truncated prefix, that's just a silly waste of space because MySQL decided to use utf-32 under the hood.

replies(3): >>41876474 #>>41876664 #>>41876823 #

28. sgarland ◴[18 Oct 24 04:14 UTC] No.41876285{4}[source]▶

>>41875336 #

> This means if you use UFT-8 in MySQL, you can’t use emoji for example.

I for one have always viewed this as a perk.

replies(1): >>41876570 #

29. sgarland ◴[18 Oct 24 04:16 UTC] No.41876296{3}[source]▶

>>41875306 #

MySQL does have ON UPDATE for its DATETIME, though; something that Postgres inexplicably still lacks.

replies(1): >>41877969 #

30. sgarland ◴[18 Oct 24 04:20 UTC] No.41876309[source]▶

>>41876232 #

They’re not wrong. If you’ve ever spent meaningful time administering both, you’ll know that Postgres takes far more hands-on work to keep it going.

To be clear, I like both. Postgres has a lot more features, and is far more extensible. But there’s no getting around the fact that its MVCC implementation means that at scale, you have to worry about things that simply do not exist for MySQL: vacuuming, txid wraparound, etc.

replies(3): >>41876387 #>>41876650 #>>41877061 #

31. fzeindl ◴[18 Oct 24 04:29 UTC] No.41876346[source]▶

>>41875126 #

* transactional DDL

* comprehensive transaction model using different modes

* PostGIS and lots of other great extensions

* supports most of the current SQL standard and is clear on interpretation of edge-cases in the documentation

* support for writing stored procedures in any major programming language

* many useful functions regarding dates, sets, ranges, json, xml, ...

* custom datatypes

* extremely thought-out and standardized approach to development: if a feature is included it generally works well in interaction with everything else

* syntax, semantics and performance are all very predictable

* great documentation

Regarding MySQL / MariaDB: MySQL optimized for performance first. Until 2010 the standard-storage-engine MyISAM didn't even support transactions.

PostgreSQL always focused on correctness and stability and then made sure everything performed.

replies(1): >>41877457 #

32. sgarland ◴[18 Oct 24 04:29 UTC] No.41876347{7}[source]▶

>>41875901 #

You should not be indexing 768 characters in any circumstance I can imagine. Go ahead and try it. Spin up two tables, fill them with a few million rows, and slap and index on them. Give one a reasonable prefix limit, and let the other go wild. Make sure you ANALYZE each, then run queries in a loop and check the times.

Spoiler: I literally did this a couple of days ago. The index size bloat means that any possible savings you might have gained from collisions are obliterated from page fetches. I tested with a measly 128 characters vs. a prefix of 16, and that was enough for the average query time to be equal, with the smaller index winning for the minimum.

33. aorloff ◴[18 Oct 24 04:36 UTC] No.41876375[source]▶

>>41875055 (TP) #

A shudder went through my body hearing that, I had forgotten all about Slony.

34. erik_seaberg ◴[18 Oct 24 04:38 UTC] No.41876384[source]▶

>>41876232 #

Early MySQL versions made egregious design choices like quietly ignoring missing foreign keys and enum typos, truncating long strings, and randomly choosing rows from groups.

https://web.archive.org/web/20230922210124/https://grimoire....

replies(1): >>41878819 #

35. wbl ◴[18 Oct 24 04:39 UTC] No.41876387{3}[source]▶

>>41876309 #

Yeah but you don't need to worry about your data existing. MySQL has been known to silently fail the one job of a DB.

replies(2): >>41876576 #>>41876737 #

36. stephenr ◴[18 Oct 24 04:47 UTC] No.41876427{5}[source]▶

>>41875689 #

> At first glance they may be less obvious, but they are significantly more discoverable. \? Just shows you all of them. In MySQL it always feels like I need to Google it.

In MySQL either `?` or `help` or `\?` will show you the help...

37. fweimer ◴[18 Oct 24 04:52 UTC] No.41876450{8}[source]▶

>>41876001 #

You might just load someone else's data, and the index is desirable in general for speeding up analytic queries. It's possible to work around that, of course. But depending on what you do, it can make writing efficient queries against the data more difficult. That's just a distraction because most of the time, those long columns won't matter anyway.

38. evanelias ◴[18 Oct 24 04:58 UTC] No.41876474{9}[source]▶

>>41876259 #

> A URL, for instance

VARBINARY is typically ok for that I'd think? Then you can utilize the full 3072 byte limit for the key, since there's no character set applied.

> even just being able to sort user-submitted strings up to a kilobyte

As a software engineer, I completely agree. But as a DBA, I am obligated to make a "tsk" sound and scowl disapprovingly!

39. kalleboo ◴[18 Oct 24 04:59 UTC] No.41876478{7}[source]▶

>>41876166 #

I need to manually admin my database server maybe once every 2 years or so. Definitely not remembering them 2 years later.

replies(1): >>41889579 #

40. cvalka ◴[18 Oct 24 05:21 UTC] No.41876552{3}[source]▶

>>41875168 #

TiDB

41. fhdsgbbcaA ◴[18 Oct 24 05:26 UTC] No.41876570{5}[source]▶

>>41876285 #

A database that doesn’t give you back what you put into it is never a perk. It literally can’t handle storing and retrieving the data.

replies(1): >>41878624 #

42. SavageBeast ◴[18 Oct 24 05:28 UTC] No.41876576{4}[source]▶

>>41876387 #

I recall this being the case A LOOOONG time ago but I haven't heard of, read about, been warned to look out for or personally seen such a thing in forever. Have you?

* I'm running a lot of MySQL stuff and such a topic might be of interest to me

replies(2): >>41876711 #>>41876762 #

43. lelanthran ◴[18 Oct 24 05:43 UTC] No.41876650{3}[source]▶

>>41876309 #

My experience of both is that MySQL is easier for developers, PostgreSQL is easier for sysads.

That was true in 2012; dunno if it still applies though.

replies(3): >>41876791 #>>41877270 #>>41877996 #

44. crazygringo ◴[18 Oct 24 05:49 UTC] No.41876664{9}[source]▶

>>41876259 #

To be honest, indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.

If you need to keep a million long URL's in a defined sort order, my first recommendation would be, don't -- see if there's another way to achieve your end result. But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.

Creating short numerical primary keys for long strings is a common database technique.

replies(1): >>41880646 #

45. o11c ◴[18 Oct 24 06:01 UTC] No.41876711{5}[source]▶

>>41876576 #

From what I can tell, MySQL is supposed to be safe since 2018 if you have no data from before 2010.

The fact that you still can't use DDL in transactions makes life exceedingly painful, but it's technically safe if you write your migration code carefully enough.

replies(2): >>41876795 #>>41877299 #

46. maxk42 ◴[18 Oct 24 06:09 UTC] No.41876737{4}[source]▶

>>41876387 #

Not in around 15 years. You're thinking of when MyISAM was the default storage engine for MySQL. It has been InnoDB for over a decade. InnoDB is very reliable - I've never had a single data loss incident in all that time, and I've managed some very large (PB-scale) and active databases.

Postgres is definitely more difficult to administer.

replies(2): >>41877978 #>>41878085 #

47. elisbce ◴[18 Oct 24 06:14 UTC] No.41876762{5}[source]▶

>>41876576 #

Yes, it is messy when you want your MySQL databases to be mission critical in production, e.g. handling a large amount of customer data. Historically MySQL's High Availability architecture has a lot of design and implementation issues because it was an afterthought. Dealing with large amount of critical data means you need it to be performant, reliable and available at the same time, which is hard and requires you to deal with caching, sharding, replication, network issues, zone/resource planning, failovers, leader elections and semi-sync bugs, corrupted logs, manually fixing bad queries that killed the database, data migration, version upgrades, etc. There is a reason why big corps like Google/Meta has dedicated teams of experts (like people who actually wrote the HA features) to maintain their mission critical MySQL deployments.

48. moogly ◴[18 Oct 24 06:16 UTC] No.41876770[source]▶

>>41875126 #

> I've always wondered why Postgres is so insanely popular

Real answer: no licensing cost

49. sofixa ◴[18 Oct 24 06:20 UTC] No.41876791{4}[source]▶

>>41876650 #

I doubt it was true in 2012, because sysadmins would be the ones trying to make it run reliably, including things like replication, upgrades, etc.

Pretty sure that even in 2012 MySQL had very easy to use replication, which Postgres didn't have well into the late 2010s (does it today? It's been a while since I've ran any databases).

replies(2): >>41876810 #>>41878057 #

50. The_Colonel ◴[18 Oct 24 06:21 UTC] No.41876795{6}[source]▶

>>41876711 #

Lack of transactional DDL is certainly painful, but not unique for MySQL. Oracle doesn't support it either.

replies(2): >>41879094 #>>41887918 #

51. lelanthran ◴[18 Oct 24 06:23 UTC] No.41876810{5}[source]▶

>>41876791 #

> I doubt it was true in 2012, because sysadmins would be the ones trying to make it run reliably, including things like replication, upgrades, etc.

Possibly I got it wrong and switched around which was easier on the devs and which was easier on the sysads?

In my defence, ISTR, when talking to sysads about MySQL vs PostgreSQL, they preferred the latter due to having less to worry about once deployed (MySQL would apparently magically lose data sometimes).

replies(1): >>41877205 #

52. The_Colonel ◴[18 Oct 24 06:25 UTC] No.41876823{9}[source]▶

>>41876259 #

> it can be stored safely in 2704

No, it can't. URL doesn't have any length limit, regardless of the fact that different software will impose different limits.

replies(1): >>41880612 #

53. cenamus ◴[18 Oct 24 07:11 UTC] No.41877029[source]▶

>>41875055 (TP) #

Is slony some sort of language joke? Slon is czech (probably slavic in general) for elephant.

(which may be actually derived from the turkish aslan, for lion, but somehow the animal got mixed up)

replies(2): >>41877211 #>>41886422 #

54. pritambarhate ◴[18 Oct 24 07:19 UTC] No.41877061{3}[source]▶

>>41876309 #

My experience has been exactly opposite. Ability to do Vacuums is good. MySQL doesn’t free up space taken by deleted rows. The only option to free up the space is to mysqldump the db and load it again. Not practical in most of the situations.

replies(3): >>41877166 #>>41877653 #>>41878767 #

55. Scramblejams ◴[18 Oct 24 07:32 UTC] No.41877119[source]▶

>>41875126 #

> I've always wondered why Postgres is so insanely popular.

Just another anecdote: MySQL lost data for me (2004). I spent some time evaluating the projects and Postgres’ development process seemed much more mature — methodical, careful, and focused on correctness. Boring, which I loved.

I didn’t need whatever perf advantage MySQL had so I switched to Postgres and never looked back. And then the Oracle drama and Monty’s behavior around it — not saying he was wrong or right, but it was the opposite of boring — just reinforced my decision.

I like to play with new tech in various spots of the stack, but for filesystems and databases I go boring all the way.

56. Propelloni ◴[18 Oct 24 07:36 UTC] No.41877139[source]▶

>>41876232 #

It is hard to live down a reputation ;)

MySQL was immortalized as the database in every LAMP stack. And just like PHP it improved considerably since then.

replies(1): >>41877303 #

57. benoitg ◴[18 Oct 24 07:42 UTC] No.41877166{4}[source]▶

>>41877061 #

Not really, the innodb_file_per_table variable has been set to 1 for a long time. Running OPTIMIZE TABLE frees up the disk space in this case.

replies(1): >>41877285 #

58. karamanolev ◴[18 Oct 24 07:49 UTC] No.41877205{6}[source]▶

>>41876810 #

MyISAM in the olden days could/would magically lose data. InnoDB has been the de facto standard for a while and I haven't seen data loss attributed to it.

59. newaccount74 ◴[18 Oct 24 07:50 UTC] No.41877211[source]▶

>>41877029 #

I think so. The PostgreSQL logo is an elephant, and the community calls him "Slonik", probably derived from the Russian word for elephant. There is also a node.js driver that is called "slonik"

60. immibis ◴[18 Oct 24 08:00 UTC] No.41877266{3}[source]▶

>>41875587 #

That's what MariaDB is for, right? I'm surprised to hear people recommend the Oracle fork of MySQL (still called MySQL because they own the trademark) rather than the original project (now called MariaDB)

61. pella ◴[18 Oct 24 08:01 UTC] No.41877268[source]▶

>>41875055 (TP) #

MySQL vs. PostgreSQL - Jepsen Analyses

https://jepsen.io/analyses/mysql-8.0.34 ( MySQL 8.0.34 2023-12-19 )

HN: https://news.ycombinator.com/item?id=38695750

"MySQL is a popular relational database. We revisit Kleppmann’s 2014 Hermitage and confirm that MySQL’s Repeatable Read still allows G2-item, G-single, and lost update. Using our transaction consistency checker Elle, we show that MySQL Repeatable Read also violates internal consistency. Furthermore, it violates Monotonic Atomic View: transactions can observe some of another transaction’s effects, then later fail to observe other effects of that same transaction. We demonstrate violations of ANSI SQL’s requirements for Repeatable Read. We believe MySQL Repeatable Read is somewhat stronger than Read Committed. As a lagniappe, we show that AWS RDS MySQL clusters routinely violate Serializability."

---------------

https://jepsen.io/analyses/postgresql-12.3 ( PostgreSQL 12.3 2020-06-12 )

HN: https://news.ycombinator.com/item?id=23498781

"PostgreSQL is a widely-known relational database system. We evaluated PostgreSQL using Jepsen’s new transactional isolation checker Elle, and found that transactions executed with serializable isolation on a single PostgreSQL instance were not, in fact, serializable. Under normal operation, transactions could occasionally exhibit G2-item: an anomaly involving a set of transactions which (roughly speaking) mutually fail to observe each other’s writes. In addition, we found frequent instances of G2-item under PostgreSQL “repeatable read”, which is explicitly proscribed by commonly-cited formalizations of repeatable read. As previously reported by Martin Kleppmann, this is due to the fact that PostgreSQL “repeatable read” is actually snapshot isolation. This behavior is allowable due to long-discussed ambiguities in the ANSI SQL standard, but could be surprising for users familiar with the literature. A patch for the bug we found in serializability is scheduled for the next minor release, on August 13th, and the presence of G2-item under repeatable read could be readily addressed through documentation."

62. GoblinSlayer ◴[18 Oct 24 08:01 UTC] No.41877270{4}[source]▶

>>41876650 #

>MySQL is easier for developers

Except that search doesn't work, because all text is in latin1 encoding.

replies(3): >>41878001 #>>41878071 #>>41878723 #

63. sudhirj ◴[18 Oct 24 08:03 UTC] No.41877285{5}[source]▶

>>41877166 #

Is this process materially different from a vacuum? Does it manage to optimise without a write lock?

replies(2): >>41877689 #>>41878011 #

64. p_l ◴[18 Oct 24 08:05 UTC] No.41877299{6}[source]▶

>>41876711 #

Some places still have columns declared as utf8 instead of utf8mb4, and there's a special place in hell for authors of the MySQL general clusterfuck regarding encodings - it was all nice and great if you didn't care about anything other than latin1 or ASCII - go outside that before utf8 option and it was horror that even experienced operators managed to fuckup (I have a badge from a Google conference in 2017 with nicely visible effect of "we have mixed up one of the three separate encoding settings in MySQL and now you have mojibake in your badge").

And then there's UTF8 not actually being UTF8, which can result in total lockup of a table if someone inputs a character that does not fit in UCS-2 and now you need to recover the database from backup and preferably convert all instances of utf8 to utf8mb4, because fuck you that's why.

replies(1): >>41878454 #

65. p_l ◴[18 Oct 24 08:06 UTC] No.41877303{3}[source]▶

>>41877139 #

Also for licensing reasons for a long time there was surviving contingent of MySQL 3.23 in LAMP hosting.

replies(1): >>41878895 #

66. ◴[18 Oct 24 08:16 UTC] No.41877357{6}[source]▶

>>41876111 #

67. arkh ◴[18 Oct 24 08:40 UTC] No.41877457{3}[source]▶

>>41876346 #

> * custom datatypes

Good in theory. But last time I checked the main libs to connect to pgsql, everything you get back from the database are strings. So you need something in your app to convert those strings to the equivalent data structures.

replies(1): >>41879925 #

68. arkh ◴[18 Oct 24 08:42 UTC] No.41877469{3}[source]▶

>>41875240 #

Having to tinker with pg_hba.conf files on the server so manage how users can connect.

replies(1): >>41878377 #

69. iamjkt ◴[18 Oct 24 09:21 UTC] No.41877653{4}[source]▶

>>41877061 #

Running 'optimize table <table>' reclaims the space on MySQL/MariaDB.

70. benoitg ◴[18 Oct 24 09:28 UTC] No.41877689{6}[source]▶

>>41877285 #

I don't know how VACUUM works, I couldn't tell you about the differences.

The OPTIMIZE works almost exclusively with online DDL statements. There's only a brief table lock held during table metadata operations, but I haven't found that to be a problem in practice. (https://dev.mysql.com/doc/refman/8.4/en/optimize-table.html#...)

71. vbezhenar ◴[18 Oct 24 10:21 UTC] No.41877951[source]▶

>>41875126 #

For me Postgres is 100% predictable and reliable. It's neither clunky nor arcane in my experience. I don't need to think about it, I just SQL it and that's about it. It quietly works in the background. At some scale there might be some issues, but there is always known path to solve things.

72. Symbiote ◴[18 Oct 24 10:22 UTC] No.41877959[source]▶

>>41875055 (TP) #

I think you're describing versions before 9.2, which was released 12 years ago.

73. fanf2 ◴[18 Oct 24 10:24 UTC] No.41877969{4}[source]▶

>>41876296 #

Isn’t ON UPDATE related to foreign keys and independent of the data type? https://www.postgresql.org/docs/current/ddl-constraints.html...

replies(1): >>41878336 #

74. Symbiote ◴[18 Oct 24 10:25 UTC] No.41877978{5}[source]▶

>>41876737 #

People are criticising decade-old releases of PostgreSQL, so it seems fair to compare them to similarly aged MySQL releases.

75. williamdclt ◴[18 Oct 24 10:27 UTC] No.41877996{4}[source]▶

>>41876650 #

Interestingly, someone else in another comment is arguing the exact opposite!

76. homebrewer ◴[18 Oct 24 10:28 UTC] No.41878001{5}[source]▶

>>41877270 #

This was never true if you know what you're doing, and hasn't been true since 2018 even if you don't.

77. homebrewer ◴[18 Oct 24 10:29 UTC] No.41878011{6}[source]▶

>>41877285 #

http://rhaas.blogspot.com/2011/02/mysql-vs-postgresql-part-2...

replies(1): >>41886804 #

78. Symbiote ◴[18 Oct 24 10:31 UTC] No.41878020{6}[source]▶

>>41876111 #

They are clearly abbreviations.

\c is for connect.

\dt is for describe tables.

79. homebrewer ◴[18 Oct 24 10:32 UTC] No.41878032{4}[source]▶

>>41875336 #

I won't defend that utf8 brain damage, but the defaults are sane since 2018 — you don't need to set the encoding, it's set to proper utf8 out of the box. MySQL 8 cleaned up a lot of this legacy stuff.

replies(1): >>41882967 #

80. yxhuvud ◴[18 Oct 24 10:35 UTC] No.41878057{5}[source]▶

>>41876791 #

In 2012 MySQL had several flavors of replications, each with its own very serious pitfalls that could introduce corruption or loss of data. I saw enough of MySQL replication issues in those days that I wouldn't want to use it.

But sure, it was easy to get a proof of concept working. But when you tried to break it by turning off network and/or machines, then shit broke down in very broken ways that was not recoverable. I'm guessing most that set up MySQL replication didn't actually verify that it worked well when SHTF.

replies(3): >>41878828 #>>41880075 #>>41880546 #

81. yxhuvud ◴[18 Oct 24 10:37 UTC] No.41878071{5}[source]▶

>>41877270 #

Either latin1 or that absolutely horrible max-three-byte utf8 encoding that used to be the only available option for utf8.

82. yxhuvud ◴[18 Oct 24 10:38 UTC] No.41878085{5}[source]▶

>>41876737 #

MySQL used to have horrible and very unsafe defaults for new installations that persisted well after the introduction of InnoDB. Those went unfixed for a very long time.

83. paulryanrogers ◴[18 Oct 24 11:22 UTC] No.41878336{5}[source]▶

>>41877969 #

Maybe they're thinking of TIMESTAMP in MySQL, which IIRC would auto update its value on any update to the row. Which was useful for uodated_at like columns. Though I think they later limited it to only the first TIMESTAMP column in a table.

replies(1): >>41878876 #

84. paulryanrogers ◴[18 Oct 24 11:29 UTC] No.41878377{4}[source]▶

>>41877469 #

I'd agree that is annoying yet usually just a one off task, unless you really want different IP allowlists per user.

replies(1): >>41880521 #

85. paulryanrogers ◴[18 Oct 24 11:31 UTC] No.41878394[source]▶

>>41875126 #

I've never lost data with PostgreSQL. MySQL had enough data loss bugs and foot guns that I ran into a few of them.

86. lol768 ◴[18 Oct 24 11:44 UTC] No.41878454{7}[source]▶

>>41877299 #

Yeah, the whole charset/collation design is a complete dumpster fire.

replies(1): >>41878747 #

87. mxey ◴[18 Oct 24 11:58 UTC] No.41878561{6}[source]▶

>>41876111 #

\? shows the help

88. sgarland ◴[18 Oct 24 12:07 UTC] No.41878624{6}[source]▶

>>41876570 #

I don’t want to see emoji in my database. The customer is only right in matters of taste, not engineering.

replies(2): >>41879747 #>>41882994 #

89. sgarland ◴[18 Oct 24 12:21 UTC] No.41878723{5}[source]▶

>>41877270 #

While obviously I prefer Unicode, latin-1 is perfectly acceptable for most use cases in Western cultures.

What part of searching have you found to not work well using it?

replies(1): >>41894259 #

90. sgarland ◴[18 Oct 24 12:23 UTC] No.41878747{8}[source]▶

>>41878454 #

In fairness, reasoning about collations is like peering into the abyss. I get why they’re required to have so many levels of detail, and the Unicode Consortium has done a fantastic job, but to say they’re complicated is putting it mildly.

91. sgarland ◴[18 Oct 24 12:26 UTC] No.41878767{4}[source]▶

>>41877061 #

VACUUM rarely reclaims space from the OS’ perspective, if that’s what you meant. It can in certain circumstances, but they’re rare. VACUUM FULL is the equivalent to OPTIMIZE TABLE – both lock the table to do a full rewrite, and optimally binpack it to the extent that is posssible.

EDIT: my mistake, OPTIMIZE TABLE is an online DDL. I’ve been burned in the past from foreign key constraint metadata locks essentially turning it into a blocking operation.

92. sgarland ◴[18 Oct 24 12:34 UTC] No.41878819{3}[source]▶

>>41876384 #

Yeah, it was bad. What kills me is SQLite has its own absurd set of gotchas [0] yet is seen as amazing and wonderful by devs. PKs can have NULLs? Sure! Strings can have \0 in the middle of them? Why not? FKs aren’t enforced by default? Yeah, who needs referential integrity, anyway?

My only conclusion is that the majority of devs don’t actually read documentation, and rely purely on the last blog post they read to influence their infrastructure decisions.

[0]: https://www.sqlite.org/quirks.html

replies(1): >>41880696 #

93. sofixa ◴[18 Oct 24 12:34 UTC] No.41878828{6}[source]▶

>>41878057 #

Maybe that was true in 2012 (maybe it was related to MyISAM) but by ~2015 with InnoDB MySQL replication was rock solid.

replies(1): >>41879698 #

94. sgarland ◴[18 Oct 24 12:41 UTC] No.41878876{6}[source]▶

>>41878336 #

No, it works for both [0] types. The first TIMESTAMP thing you’re referring to is that if a specific variable isn’t set, the first TIMESTAMP column automatically gets auto updates applied on creation and update, unless you explicitly defined it to not. This was the default behavior in 5.7, but has since been changed.

[0]: https://dev.mysql.com/doc/refman/8.0/en/timestamp-initializa...

95. johannes1234321 ◴[18 Oct 24 12:43 UTC] No.41878895{4}[source]▶

>>41877303 #

While that change from LGPL to GPL affected only the client library (server always was GPL(+commercial)) and the MySQL company relatively quickly reacted with a FOSS exception to the GPL and by providing a reimplementation of the client library under PHP license (mysqlnd) to serve that market.

(I joined MySQL shortly after that mess, before the Sun acquisition)

replies(1): >>41880610 #

96. justinclift ◴[18 Oct 24 13:08 UTC] No.41879094{7}[source]▶

>>41876795 #

Both databases are owned by the same company. Wonder if it's something inherent to the company which has keep that limitation going?

97. yxhuvud ◴[18 Oct 24 14:16 UTC] No.41879698{7}[source]▶

>>41878828 #

It was not related to MyISAM.

How did you verify that it was rock solid? And which of the variants did you use?

98. ttfkam ◴[18 Oct 24 14:20 UTC] No.41879736{3}[source]▶

>>41875306 #

Transactional DDL!

99. ttfkam ◴[18 Oct 24 14:22 UTC] No.41879747{7}[source]▶

>>41878624 #

Uhh… not wanting to see emojis is a matter of taste, not engineering.

100. icedchai ◴[18 Oct 24 14:44 UTC] No.41879900{4}[source]▶

>>41875276 #

Ugh. I worked with MySQL earlier in my career (until about 10 years ago.) All the companies since have been Postgres. All my personal projects are Postgres. I can't imagine going back.

101. ttfkam ◴[18 Oct 24 14:46 UTC] No.41879925{4}[source]▶

>>41877457 #

You're thinking only in terms of application. Types in the db save storage space, allow for better validation than plain strings, can be correlated cleanly with other columns with the same type, etc.

Yes, more drivers and libraries should support the more expansive data type list, but even just within the database itself there are multiple advantages.

102. evanelias ◴[18 Oct 24 15:03 UTC] No.41880075{6}[source]▶

>>41878057 #

Many of the largest US tech companies were successfully using MySQL replication in 2012 without frequent major issues.

source: direct personal experience.

103. Tostino ◴[18 Oct 24 15:49 UTC] No.41880521{5}[source]▶

>>41878377 #

In complex environments it is not just a one off task. I dealt with it by automating my infrastructure with ansible, but without some tooling it sucks.

104. est ◴[18 Oct 24 15:51 UTC] No.41880546{6}[source]▶

>>41878057 #

> pitfalls that could introduce corruption or loss of data

sometimes, repairing broken data is easier than, say, upgrading a god damn hot DB.

MVCC is overrated. Not every row in a busy MySQL table is your transactional wallet balance. But to upgrade a DB you have to deal with every field every row every table, and data keeps changing, which is a real headache

Fixing a range of broken data, however, can be done by a junior developer. If you rely on rdbms for a single source of truth you are probably fucked anyway.

btw I do hate DDL changes in MySQL.

105. p_l ◴[18 Oct 24 15:57 UTC] No.41880610{5}[source]▶

>>41878895 #

Random hosting providers that were major place for having your baby steps on LAMP stack didn't necessarily grok licensing much

replies(1): >>41884281 #

106. bastawhiz ◴[18 Oct 24 15:57 UTC] No.41880612{10}[source]▶

>>41876823 #

Browser address bars have a limit of 2048, so if that's your use case, yes it's safe.

replies(1): >>41885721 #

107. bastawhiz ◴[18 Oct 24 16:01 UTC] No.41880646{10}[source]▶

>>41876664 #

> indexes aren't designed for that. They're meant for fast lookup of short identifiers. Things like people's names and product ID's. Not long URL's. It's not performant.

This is objectively false. If this was true, indexes wouldn't serve range queries. You couldn't index on dates. You couldn't sort numbers.

> But if you absolutely have to, then create a new integer column to be your sort key, and use a little bit of extra code to give it values that produce the same sort order.

This fails when you need to insert new values into the table. Then you not only need to figure out the new integer value (how, if you can't efficiently compare sorted string values???), you need to update all the integers to make room.

replies(1): >>41881416 #

108. stickfigure ◴[18 Oct 24 16:05 UTC] No.41880682{5}[source]▶

>>41875779 #

"Aside from missing his head, the patient appears to be in fine shape."

replies(1): >>41880824 #

109. hu3 ◴[18 Oct 24 16:06 UTC] No.41880696{4}[source]▶

>>41878819 #

Understanding and memorizing shortcomings and quirks takes time and effort.

Most devs just go with whatever the influencer du jour says is good.

"Nobody ever got fired for choosing insert_currently_hyped_tech_here"

110. evanelias ◴[18 Oct 24 16:17 UTC] No.41880824{6}[source]▶

>>41880682 #

That hardly seems equivalent. Why do you need to e.g. reconfigure replication inside of a transaction in the first place?

The lack of transactional DDL is a totally valid complaint, but the non-DDL stuff is just a total head-scratcher to me. Aside from DDL, implicit commits have literally never impacted me in my 21 years of using MySQL.

replies(1): >>41885276 #

111. crazygringo ◴[18 Oct 24 17:12 UTC] No.41881416{11}[source]▶

>>41880646 #

Sorry, I was considering short things like dates and numbers as identifiers. I realize that's not quite right -- what I should have said was that indexes are designed for short things period (short identifiers being one of those things). Thanks.

> This fails when you need to insert new values into the table.

Yes, that's part of the extra code you need to keep the values accurately sorted. There are a lot of different particular code solutions that might work -- whether allowing for collisions and re-ordering every night with a cron job, or putting large gaps between numbers, or using floats.

But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box. Indexes aren't meant for that and they won't work, and this is by design. You're going to have to roll your own code for that.

Fortunately I've never come across a case in the wild where maintaining a globally sorted list of long items was required (though I'm not saying they never exist). E.g. if you're building a spider that needs to match against URL's, you'd index a short hash of the URL as a non-unique index. Or if you wanted to display sorted URL's for a site, you'd index by domain name only, and then sort the remainder of the URL at query time.

replies(1): >>41883759 #

112. fhdsgbbcaA ◴[18 Oct 24 20:04 UTC] No.41882967{5}[source]▶

>>41878032 #

Good to hear they saw the light but after I switched to Postgres I never had a single regret.

In a competitive market where people make very long term engineering decisions based on stability and reliability you can’t fuck up this badly and survive.

113. fhdsgbbcaA ◴[18 Oct 24 20:07 UTC] No.41882994{7}[source]▶

>>41878624 #

Ok so if you are doing sentiment analysis of user product reviews you want to silently truncate emoji because you don’t like them? That’s a good idea how?

114. bastawhiz ◴[18 Oct 24 21:41 UTC] No.41883759{12}[source]▶

>>41881416 #

> But my main point stands, which is that standard relational databases are not designed to be able to maintain a sorted index of long URL's out of the box.

You keep saying that, but Postgres does a great job with no issues without any extra work. MySQL is alone in being suboptimal. "It's not designed for that" isn't a good answer, if it works great. Show me how the underlying data structures fail or perform poorly if it's really not something you should do.

replies(2): >>41883900 #>>41884084 #

115. crazygringo ◴[18 Oct 24 22:01 UTC] No.41883900{13}[source]▶

>>41883759 #

No, Postgres doesn't. 2730 bytes is not long enough to hold all URL's encountered in the wild. But also, your performance will suffer if you use that whole length. You generally don't want to be doing that.

The difference between MySQL and Postgres here is negligible. It doesn't matter exactly where you define the limit of a short field, except it should probably be able to hold a maximum length filename which is 255 characters, plus some room to spare. Both MySQL and Postgres do this fine.

116. evanelias ◴[18 Oct 24 22:32 UTC] No.41884084{13}[source]▶

>>41883759 #

> MySQL is alone in being suboptimal.

It's only suboptimal if you choose the wrong column type for the task at hand. For storing URLs, you almost certainly don't want collation behaviors, such as accent insensitivity or case insensitivity. So VARBINARY is a better choice here anyway.

And as several other commenters have mentioned, at large scale, indexing a bunch of long URLs in b-trees is indeed a bad practice performance-wise in any relational database. You won't be able to fit many entries per page, so read performance will be slow, especially for range scans.

In that situation it's almost always better to use a non-unique index over a prefix (if you need sorting and range scans) or a hash (if you don't), and disambiguate collisions by having the full value in an unindexed column. And/or split the URL up between the domain name and path in separate columns. If needed, normalize the domain names into a separate table so that the URL table can refer to them by numeric ID. etc. All depends on the specific use-case.

117. ahoka ◴[18 Oct 24 22:32 UTC] No.41884086{6}[source]▶

>>41876111 #

They kinda make sense if you consider that Postgres was not an SQL database in the beginning. Quirky though.

118. johannes1234321 ◴[18 Oct 24 23:04 UTC] No.41884281{6}[source]▶

>>41880610 #

They also didn't like updating software - to likely that update to PHP or MySQL or something broke some bad script by a customer, who'd complain to the host.

119. stickfigure ◴[19 Oct 24 02:51 UTC] No.41885276{7}[source]▶

>>41880824 #

Sorry - I was trying to make light of the discussion. DDL is so important that it's silly to talk about the other stuff.

120. The_Colonel ◴[19 Oct 24 04:54 UTC] No.41885721{11}[source]▶

>>41880612 #

Safari has 80 000, Firefox 65K.

There are plenty of needs to store URLs which will never go through a browser.

You can only claim that "some URL use cases" can be stored in 2048 characters.

121. anticensor ◴[19 Oct 24 08:13 UTC] No.41886422[source]▶

>>41877029 #

Not really, the older form of Turkish aslan is actually arslan.

replies(1): >>41887600 #

122. sudhirj ◴[19 Oct 24 09:49 UTC] No.41886804{7}[source]▶

>>41878011 #

That helps a lot thanks. Will summarize it quickly for those who come later: MySQL (InnoDB really) and Postgres both use MVCC, so they write a new row on update. InnoDB however also additionally writes a record marking the old row for deletion.

To do a cleanup, InnoDB uses the records it kept to delete old data, while Postgres must do a scan. So InnoDB pays a record-keeping price as part of the update that makes it easier to clear data, while Postgres decides to pay this price of occasional scanning.

123. cenamus ◴[19 Oct 24 13:17 UTC] No.41887600{3}[source]▶

>>41886422 #

I just read that some time ago

https://en.wiktionary.org/wiki/Reconstruction:Proto-Slavic/s...

124. tpmoney ◴[19 Oct 24 14:22 UTC] No.41887918{7}[source]▶

>>41876795 #

Oracle also didn't support Boolean data types for a long time, and had a 20 some odd year public thread arguing that no one needed a Boolean data type (https://asktom.oracle.com/ords/f?p=100:11:0::::P11_QUESTION_...). They finally added it in Oracle 23 which is nice, but I wouldn't consider it to be in good company to be lacking something Oracle also lacks.

replies(1): >>41889141 #

125. The_Colonel ◴[19 Oct 24 17:27 UTC] No.41889141{8}[source]▶

>>41887918 #

Not having a boolean data type is IMHO just an annoyance, not comparable to the lack of transactional DDL.

But to the point, people often use this point to claim that MySQL is a toy database, not usable for real world production use. I use Oracle as a counterpoint, which also has a lot of warts but is pretty much an archetype of an enterprise-grade DB engine.

126. dventimi ◴[19 Oct 24 18:25 UTC] No.41889579{8}[source]▶

>>41876478 #

Sounds like a YP

127. GoblinSlayer ◴[20 Oct 24 09:48 UTC] No.41894259{6}[source]▶

>>41878723 #

The catch is that PHP backend passes strings in utf8.

128. slotrans ◴[23 Oct 24 18:33 UTC] No.41927974[source]▶

>>41876232 #

It is funny reading the replies here.

I am a database specialist and have worn the DBA had for many years. I have run MySQL and Postgres in production, both self-hosted and using managed services. Postgres wins on every single dimension that matters, every time. Yes MySQL is easier to setup for non-experts. That counts for nothing.

If you are sticking up for MySQL in this thread... I just don't even know, man.

↑