←back to thread

334 points gjvc | 1 comments | | HN request time: 0s | source
Show context
throwaway892238 ◴[] No.31849720[source]
This is the future of databases, but nobody seems to realize it yet.

One of the biggest problems with databases (particularly SQL ones) is they're a giant pile of mutable state. The whole idea of "migrations" exists because it is impossible to "just" revert any arbitrary change to a database, diff changes automatically, merge changes automatically. You need some kind of intelligent tool or framework to generate DDL, DML, DCL, they have to be applied in turn, something has to check if they've already been applied, etc. And of course you can't roll back a change once it's been applied, unless you create even more program logic to figure out how to do that. It's all a big hack.

By treating a database as version-controlled, you can treat any operation as immutable. Make any change you want and don't worry about conflicts. You can always just go back to the last working version, revert a specific change, merge in one or more changes from different working databases. Make a thousand changes a day, and when one breaks, revert it. No snapshotting and slowly restoring the whole database due to a non-reversible change. Somebody dropped the main table in prod? Just revert the drop. Need to make a change to the prod database but the staging database is different? Branch the prod database, make a change, test it, merge back into prod.

The effect is going to be as radical as the popularization of containers. Whether you like them or not, they are revolutionizing an industry and are a productivity force multiplier.

replies(11): >>31849825 #>>31849875 #>>31849951 #>>31850566 #>>31850778 #>>31851109 #>>31851356 #>>31852067 #>>31853553 #>>31858826 #>>31865675 #
1. hinkley ◴[] No.31851109[source]
In a similar vein, I am trying to work on a tool for a volunteer group, and one of the problems I'm trying to figure out how to model is not just tracking when the data changed, but why it changed.

Sometimes when you're first entering data you just get it wrong. You said the wrench was in storeroom A1 or the customer lives on 3rd St or the tree was five feet from the sidewalk. If the state of the asset changes due to human action, that's a new thing. But if you opened the wrong list taking inventory, or you missed a keystroke for 33rd St or you just eyeballed the coordinates, then that row was simply wrong, and the correction should be back-dated for most things.

But if I emptied out A1 because we aren't renting that space anymore, the customer moves or the tree got replanted, then it was here and now it's over there. Which might be important for figuring out things like overstock, taxes or success rates.

Similarly if the migration introduces derived data, then the migrated data is assumed/inferred, whereas if we took that data from a user, that information is confirmed, which might introduce subtle differences in how best to relate to the user. Things a mediocre business could easily ignore but a quality establishment might be ill-pleased with such a request.