←back to thread

752 points crazypython | 1 comments | | HN request time: 0.2s | source
Show context
scottmcdot ◴[] No.26371504[source]
Dolt might be good but never underestimate the power of Type 2 Slowly Changing Dimension tables [1]. For example, if you had an SSIS package that took CSV and imported them into a database, and one day you noticed it accidently rounded the value incorrectly, you could fix the data and retain traceability of the data which was there originally.

E.g., SSIS package writes row of data: https://imgur.com/DClXAi5

Then a few months later (on 2020-08-15) we identify that trans_value was imported incorrectly so we update it: https://imgur.com/wdQJWm4

Then whenever we SELECT from the table we always ensure we are extracting "today's" version of the data:

    select * from table where TODAY between effective_from and effective_to
[1] https://en.wikipedia.org/wiki/Slowly_changing_dimension
replies(4): >>26371578 #>>26371814 #>>26371976 #>>26374137 #
1. sixdimensional ◴[] No.26371578[source]
I definitely agree, just tossing in the superset concept that Dolt and Type 2 SCD involve - temporal databases [1].

I think the idea of a "diff" applied to datasets is quite awesome, but even then, we kind of do that with databases today with data comparison tools - it's just most of them are not time aware, rather they are used to compare data between two instances of the data in different databases, not at two points in time in the same database.

[1] https://en.wikipedia.org/wiki/Temporal_database