←back to thread

334 points gjvc | 1 comments | | HN request time: 0.304s | source
Show context
cgio ◴[] No.31848614[source]
I have been interested in this space, but have failed to understand how these versioning solutions for data work in the context of environments. There are aspects of time travel that line up better e.g. with data modelling approaches (such as bitemporal DBs, xtdb etc.) others more with git-like use cases (e.g. schema evolution, backfilling) some combinations. The challenge is, with data I don’t see how you’d like to have all environments in same place/repo and there may be additional considerations coupled with directionality of moves, such as anonymisation for moving from prod to non-prod , back filling for moving from non-prod to prod etc. Keen to read more on other people experiences in this space and how they might be combining different solutions.
replies(2): >>31848970 #>>31851905 #
1. ollien ◴[] No.31851905[source]
I've only used Dolt once, but it was very helpful when I was working in an ML class on a group project. Previously we would commit gigantic CSVs to git, which sucked. Putting it in Dolt made a lot of the data exploration and sharing easier, and separated our data from the code.