Most active commenters
  • evanelias(5)
  • ttfkam(4)

←back to thread

265 points davidgomes | 12 comments | | HN request time: 1.266s | source | bottom
Show context
noirscape ◴[] No.41877487[source]
Because the actual process of upgrading Postgres is terrible.

I say this as someone who absolutely loves using it, but the actual process of upgrading Postgres is something that takes significant downtime, is error-prone and you're often better off just dumping all the databases and reimporting them in a new folder. (A good idea in general since it'll vacuum and compact indexes as well if you do it, combining a few maintenance routines in your upgrade path.)

It requires having the previous version of Postgres installed, something which can mess with a number of distro policies (not to mention docker, which is the most popular way to deploy software that will typically rely on Postgres), and unlike most software with that issue, Postgres is software you want to be managed by your distro.

Therefore, most people only upgrade by necessity - when their distro forces the upgrade or the version they're using reaches EOL.

replies(11): >>41877841 #>>41877921 #>>41877992 #>>41878101 #>>41878462 #>>41878670 #>>41879013 #>>41879161 #>>41879191 #>>41879259 #>>41879567 #
MichaelZuo ◴[] No.41877841[source]
So the real question is, why is the upgrade process so incompetently designed, and why has no one fixed this?
replies(5): >>41877898 #>>41877902 #>>41877926 #>>41878252 #>>41878442 #
phil21 ◴[] No.41877902[source]
My opinion is Postgres was designed by software developers for software developers. The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set, and the sysadmins having to support maintenance and production availability preferring MySQL.

One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level. Apparently us lowly sysadmins may have had a point after all.

This is a bit tongue in cheek but really not far from my lived reality. When the focus is on features and “correctness” at the near total expense of sane systems tooling folks can develop some myopia on the subject. So many arguments with devs on my teams over this subject that were utterly horrified to find we were running MySQL for a given service.

Open source projects tend to fix the pain points its contributors experience, and I assume there were not too many contributors wanting to deal with the boring work of making administration and easy task - it’s thankless “sideways” work that won’t result in many accolades or personal satisfaction for most SWEs.

The end users are almost always developers, most of whose experiences in production entail either the equivalent of a docker container level scale system, or are simply given a connection string and the rest is a black box to them. Under those contexts I’d personally prefer Postgres as well and it wouldn’t even be close. When you get into backups, clustering, upgrades, and high availability under extreme load? IMO the story falls apart real fast.

replies(5): >>41878214 #>>41878540 #>>41878590 #>>41878653 #>>41881690 #
sgarland ◴[] No.41878653[source]
> The split on “which relational database to use” in my career has almost always been perfectly split between SWE vehemently demanding pgsql for the feature set

I’ve seen this as well, but when pressed, none of them could articulate what part of its feature set they actually needed to use.

> One of the few things I’ve enjoyed with the move into devops and companies forcing previously “pure” developers into operational roles was their discovery that Postgres was utterly horrible to administer at a systems level.

Are you (or your devs, rather) actually running your own DBs? If so, respect. My experience has been that they spin up either a hideously under or over-provisioned RDS or Aurora instance, and then never touch it until it breaks, at which point they might ask for help, or they might just make it bigger.

replies(3): >>41878860 #>>41879564 #>>41879577 #
1. ttfkam ◴[] No.41879564[source]
> none of them could articulate what part of its feature set they actually needed to use.

Transactional DDL: migration errors never leave the database in an intermediate/inconsistent state.

Range types + exclusion constraint: just no way to do this in MySQL without introducing a race condition.

Writeable CTEs: creating insert/update/delete pipelines over multiple tables deterministically. Seriously though, the RETURNING clause is something I use all the time both in and out of CTEs.

Filtered aggregates and grouping sets: cleanly get multiple data points for a dashboard in one shot.

Unnest: converting arrays into a set of rows. Inverse of array_agg(...).

Types: arrays, booleans, IP/subnets, UUIDs (without binary(16) hacks), etc.

Materialized views: seriously, how does MySQL not have this yet?

Statement-level triggers: another option from per-row.

Row-level security: setting data visibility based on configurable policies.

I can cite specific use cases I've deployed to production for each of these and more.

replies(2): >>41879932 #>>41882079 #
2. evanelias ◴[] No.41879932[source]
That's a good list [1]. A handful of these are already doable in modern MySQL and/or MariaDB though.

JSON can often be used in place of arrays, and JSON_TABLE in both MySQL and MariaDB converts JSON into tabular data. MySQL supports multi-valued indexes over JSON, where each row can have multiple index entries (or no entries, e.g. partial index).

MariaDB has built-in convenience types for ipv4, ipv6, and uuid. Or in MySQL you can just use virtual columns to add human-readable conversions of binary columns, although that is admittedly slightly annoying.

MariaDB supports RETURNING.

[1] Edit to add: I do mean that honestly, it's an accurate and insightful list of nice Postgres features, most of which aren't in MySQL or MariaDB. Honestly baffled as to why I'm being downvoted.

replies(2): >>41880852 #>>41885699 #
3. ttfkam ◴[] No.41880852[source]
> JSON can often be used in place of arrays

This is like storing UUIDs as text. You lose type information and validation. It's like storing your array as a comma-delimited string. It can work in a pinch, but it takes up more storage space and is far more error prone.

> convenience types for ipv4, ipv6, and uuid.

That's nice to see. A shame you have to decide ahead of time whether you're storing v6 or v4, and I don't see support for network ranges, but a definite improvement.

> MariaDB supports RETURNING.

That's honestly wonderful to see. Can these be used inside of CTEs as well for correlated INSERTs?

replies(1): >>41881536 #
4. evanelias ◴[] No.41881536{3}[source]
Regarding using JSON for arrays, MySQL and MariaDB both support validation using JSON Schema. For example, you can enforce that a JSON column only stores an array of numbers by calling JSON_SCHEMA_VALID in a CHECK constraint.

Granted, using validated JSON is more hoops than having an array type directly. But in a pinch it's totally doable.

MySQL also stores JSON values using a binary representation, it's not a comma-separated string.

Alternatively, in some cases it may also be fine to pack an array of multi-byte ints into a VARBINARY. Or for an array of floats, MySQL 9 now has a VECTOR type.

Regarding ipv6 addresses: MariaDB's inet6 type can also store ipv4 values as well, although it can be inefficient in terms of storage. (inet6 values take up a fixed 16 bytes, regardless of whether the value is an ipv4 or ipv6 address.)

As for using RETURNING inside a writable CTE in MariaDB: not sure, I'd assume probably not. I must admit I'm not familiar with the multi-table pipeline write pattern that you're describing.

replies(1): >>41883536 #
5. sgarland ◴[] No.41882079[source]
That is a well-thought out list, and you’re clearly aware of and take advantage of the DB’s capabilities. Seriously, congrats. Especially RETURNING – it’s always baffling to me why more people don’t use it (or its sad cousin in MySQL that lets you get the last inserted rowid if using an auto-increment).

Most devs I’ve worked with don’t know about aggregations beyond COUNT and GROUP BY, and do everything in the app. I’ve pointed these out before, and am always told, “we don’t want to have additional logic in the DB.” So you want a dumb bit box then, got it – why are you using an RDBMS?

> Transactional DDL

I know this is a thing, and I’ve used it in Postgres, but I’ve also never found myself in MySQL being upset that I didn’t have it. Everything should be thoroughly tested in staging before prod.

> RLS

Yes, amazing feature if you use it correctly.

I will give MySQL a couple of nods in its favor: well, three.

1. Clustering index. If you design your schema around this fact, range queries can be WAY faster. Dropbox does this, IIRC.

2. Generated Virtual Columns. Not sure why Postgres still doesn’t support this.

3. ON UPDATE CURRENT_TIMESTAMP. It’s so nice to have the DB automatically track update times for you, IMO.

replies(1): >>41883378 #
6. ttfkam ◴[] No.41883378[source]
I use transactional DDL all the time, even during development. It's nice to not have to fully reset your schema every time you test a migration file locally. With transactional DDL, you run the whole list, and if any fails, it rolls back to where you started. You look at the error, edit your migration, and try again. It really is a time saver. There is a peace of mind always knowing your schema is in a consistent state not unlike the peace that comes from a good set of unit tests.

1. Yep, I definitely miss clustering indexes in Postgres sometimes. I can sometimes fake it with covering indexes when all I want are an extra column or two along with the primary key or similar without seeking to the main table, but you're right about that MySQL/MariaDB win here.

2. The dynamic computed column is an easy workaround with immutable functions that take the record as a param.

        CREATE TABLE foo ( a int, b int, c int );
        CREATE FUNCTION d(entry foo) RETURNS int LANGUAGE sql IMMUTABLE AS $$
          SELECT foo.a + foo.b + foo.c;
        $$;

        SELECT a, b, c, d(foo) FROM foo;
It's not part of the table schema when doing a SELECT *, but it is just as efficient as a computed column in MySQL/MariaDB and only slightly more verbose.

3. ON UPDATE CURRENT_TIMESTAMP works in Postgres with a trigger function, which you can reuse if all your tables use the same name for your "last_modified" column (probably a good idea anyway). Not as convenient as the declarative syntax, but it's a fairly trivial workaround.

        CREATE OR REPLACE FUNCTION update_last_modified() RETURNS TRIGGER AS $$
          BEGIN
            NEW.last_modified = now();
            RETURN NEW;   
          END;
        $$ language 'plpgsql';

        CREATE TRIGGER foo_last_modified BEFORE UPDATE ON foo
        FOR EACH ROW EXECUTE PROCEDURE update_last_modified();

        CREATE TRIGGER bar_last_modified BEFORE UPDATE ON bar
        FOR EACH ROW EXECUTE PROCEDURE update_last_modified();
One function, many triggers. You also get to choose between "when transaction started" (now() or CURRENT_TIMESTAMP), "when statement started" (statement_timestamp()), or "right now" (clock_timestamp()).

I don't mind workarounds so much as functionality that simply cannot be replicated. For example I miss real temporal table support in Postgres like what you can find in MariaDB or MS SQL Server. The painful kludges for missing PIVOT support like in MS SQL Server is another one.

You never know how much you need deferred foreign key constraints until you don't have them anymore. Or a materialized view.

replies(1): >>41883889 #
7. ttfkam ◴[] No.41883536{4}[source]
> the multi-table pipeline write pattern

        WITH new_order AS (
            INSERT INTO order (po_number, bill_to, ship_to)
                 VALUES ('ABCD1234', 42, 64)
              RETURNING order_id
        )
        INSERT INTO order_item (order_id, product_id, quantity)
             SELECT new_order.order_id, vals.product_id, vals.quantity
               FROM (VALUES (10, 1), (11, 5), (12, 3)) AS vals(product_id, quantity)
              CROSS JOIN new_order
        ;
Not super pretty, but it illustrates the point. A single statement that creates an order, gets its autogenerated id (bigint, uuid, whatever), and applies that id to the order items that follow. No network round trip necessary to get the order id before you add the items, which translates into a shorter duration for the transaction to remain open.
replies(1): >>41883777 #
8. evanelias ◴[] No.41883777{5}[source]
Thanks, that makes sense.

In this specific situation, the most common MySQL/MariaDB pattern would be to use LAST_INSERT_ID() in the second INSERT, assuming the order IDs are auto-increments. Or with UUIDs, simply generating the ID prior to the first INSERT, either on the application side or in a database-side session variable.

To avoid extra network calls, this could be wrapped in a stored proc, although a fair complaint is that MySQL doesn't support a ton of different programming langauges for procs/funcs like Postgres.

9. evanelias ◴[] No.41883889{3}[source]
> It's nice to not have to fully reset your schema every time you test a migration file locally

In terms of dev flow, this is only a problem with imperative migration systems. Declarative schema management tools solve it by being able to transition any live database state into the desired state, which is expressed by a repo of CREATE statements.

If something fails, you fix the bad CREATE and run the tool again, and it effectively picks up where it left off. And well-designed declarative tools catch many problems pre-flight anyway by running a suite of linters, running the emitted SQL statements in a sandbox first, etc.

If the tool's diff returns clean, you know your schema is in the right state on the DB.

Ironically, lack of transactional DDL actually makes declarative schema management more straightforward in MySQL/MariaDB: you can't mix DDL and DML there anyway, so it's more natural to handle schema changes vs data migrations using different tools/pipelines.

10. ComputerGuru ◴[] No.41885699[source]
You still can’t use uuid as proper foreign keys with validation on mariaDB/MySQL though, right? It wasn’t possible with blobs at any rate.
replies(1): >>41890351 #
11. evanelias ◴[] No.41890351{3}[source]
This has always been possible, for example using the BINARY(16) column type if you want to be efficient. Or in MariaDB 10.7+ you can now use the dedicated UUID column type, which is equivalent to BINARY(16) under the hood, but provides a human-readable hex value when queried.

UUIDs are fixed-length. Blobs are not the appropriate type for that.

replies(1): >>41921933 #
12. ComputerGuru ◴[] No.41921933{4}[source]
Sorry, I was mistaken and I already do have that set up and working using binary. There’s something else I’m trying but failing to remember that isn’t possible with binary keys.