Dolt is Git for data | slacker news

Only 39 days since the last "GitHub for data" was announced: https://news.ycombinator.com/item?id=22375774

I'll say what I said in February: I started a company with the same premise 9 years ago, during the prime "big data" hype cycle. We burned through a lot of investor money only to realize that there was not a market opportunity to capture. That is, many people thought it was cool - we even did co-sponsored data contests with The Economist - but at the end of the day, we couldn't find anyone with an urgent problem that they were willing to pay to solve.

I wish these folks luck! Perhaps things have changed; we were part of a flock of 5 or 10 similar projects and I'm pretty sure the only one still around today is Kaggle.

https://www.youtube.com/watch?v=EWMjQhhxhQ4

We also started "Git for data" several years ago but since then pivoted to data science/ML tooling (https://dotscience.com/) by building features that people actually want on the original product. Since then the "git for data" accounts only probably for 5% of the total functionality :)

I guess "Git for data" is not very useful if you don't have the whole platform built around it to actually use the features. We mainly use it for data synchronization between the nodes and provenance tracking so people can see what data was used to build specific models and to track how the project evolves itself without forcing people to "commit" their changes manually (as we have seen that often data scientists don't even use git, just files on their Jupyter notebooks).