←back to thread

752 points crazypython | 1 comments | | HN request time: 0.197s | source
Show context
crazygringo ◴[] No.26371552[source]
This is absolutely fascinating, conceptually.

However, I'm struggling to figure out a real-world use case for this. I'd love if anyone here can enlighten me.

I don't see how it can be for production databases involving lots of users, because while it seems appealing as a way to upgrade and then roll back, you'd lose all the new data inserted in the meantime. When you roll back, you generally want to roll back changes to the schema (e.g. delete the added column) but not remove all the rows that were inserted/deleted/updated in the meantime.

So does it handle use cases that are more like SQLite? E.g. where application preferences, or even a saved file, winds up containing its entire history, so you can rewind? Although that's really more of a temporal database -- you don't need git operations like branching. And you really just need to track row-level changes, not table schema modifications etc. The git model seems like way overkill.

Git is built for the use case of lots of different people working on different parts of a codebase and then integrating their changes, and saving the history of it. But I'm not sure I've ever come across a use case for lots of different people working on the data and schema in different parts of a database and then integrating their data and schema changes. In any kind of shared-dataset scenario I've seen, the schema is tightly locked down, and there's strict business logic around who can update what and how -- otherwise it would be chaos.

So I feel like I'm missing something. What is this actually intended for?

I wish the site explained why they built it -- if it was just "because we can" or if projects or teams actually had the need for git for data?

replies(6): >>26371614 #>>26371700 #>>26371748 #>>26371803 #>>26371969 #>>26372126 #
1. fiedzia ◴[] No.26372126[source]
This won't work for usual database usecases. This is meant for interactive work with data same way you work with code. Who needs that?

Data scientists working with large datasets. You want to be able to update data without redownloading everything. Also make your local changes (some data cleaning) and propose your updates upstream same way you would with git. Having many people working interactively with data is common here.

One of the companies I work with provided set of data distributed to their partners on a daily basis. Once it grew larger, downloading everything daily became an issue. So that would be desirable,

I have large data model that I need to deploy to production and update once in a while. For code, network usage is kept to minimum because we have git. For data, options are limited.

As with git, it is something that once you have, you will find a lot of usecases that make life easier and open many new doors.