←back to thread

Dolt is Git for data

(www.dolthub.com)
358 points timsehn | 6 comments | | HN request time: 0.811s | source | bottom
Show context
peteforde ◴[] No.22734564[source]
Only 39 days since the last "GitHub for data" was announced: https://news.ycombinator.com/item?id=22375774

I'll say what I said in February: I started a company with the same premise 9 years ago, during the prime "big data" hype cycle. We burned through a lot of investor money only to realize that there was not a market opportunity to capture. That is, many people thought it was cool - we even did co-sponsored data contests with The Economist - but at the end of the day, we couldn't find anyone with an urgent problem that they were willing to pay to solve.

I wish these folks luck! Perhaps things have changed; we were part of a flock of 5 or 10 similar projects and I'm pretty sure the only one still around today is Kaggle.

https://www.youtube.com/watch?v=EWMjQhhxhQ4

replies(15): >>22734677 #>>22734738 #>>22734742 #>>22734839 #>>22735019 #>>22735030 #>>22735213 #>>22735358 #>>22735661 #>>22736049 #>>22736513 #>>22736785 #>>22737514 #>>22737860 #>>22738642 #
ken ◴[] No.22735030[source]
That's GitHub for data. It's a service, and they still haven't launched anything yet.

This is Git for data. It's a program, and it appears to be an open-source one you can download and use today.

replies(2): >>22735037 #>>22735068 #
1. enos_feedler ◴[] No.22735068[source]
There is actually an old git for data project too:

https://github.com/datproject/dat

It's ~5 years old and I really wanted it to be huge. Hoping this new project is a success. Especially since I notice I went to high school with one of the founders of Dolt (Hey Tim!)

replies(3): >>22735185 #>>22736907 #>>22737454 #
2. visarga ◴[] No.22735185[source]
Can it remove a file from the repo history? It's a GDPR feature that makes git hard to use for data.
3. cbenz ◴[] No.22736907[source]
Dat is more about distribution (décentralized, distributed, P2P) but it's not possible to make queries.
replies(1): >>22738838 #
4. ken ◴[] No.22737454[source]
That project looks like a command-line p2p file sharing system. There doesn't appear to be any branching. It also doesn't appear to be a database (like with a schema), but simply raw files being passed around. There's no data types or queries.

I'm not sure why you bring it up now. They don't call it "git for data" anywhere that I see, and it's missing 2 of the 3 core features that I think a "git for data" would need to have.

replies(1): >>22738714 #
5. enos_feedler ◴[] No.22738714[source]
Like I said, this project is old. I brought it up in the context of older projects, independent of whether they succeeded pivoted, etc. If you did some research you would have made this connection:

https://www.youtube.com/watch?v=FX7qSwz3SCk (2013) - 'Introducing Dat: If Git Were Designed For Big Data' Talk by the founder.

My point is they pivoted and so maybe this idea won't work, or this was too early.

EDIT: Looking back on _your_ post, I mentioned it because you specifically said "It's a program, and it appears to be an open-source one you can download and use today." And that is what 'dat' is/was. I thought I would mention it.

6. enos_feedler ◴[] No.22738838[source]
Yes, the project used to associate itself as a git for data, but I guess not in the sense of a db.