Dolt is Git for Data

(github.com)

334 points gjvc | 2 comments | 23 Jun 22 11:04 UTC | HN request time: 0.472s | source

Show context

cosmic_quanta ◴[23 Jun 22 12:00 UTC] No.31847838[source]▶

That looks awesome. One of the listed use-cases is 'time-travel': https://dolthub.com/blog/2021-03-09-dolt-use-cases-in-the-wi...

I wish we could use this at work. We're trying to predict time-series stuff. However, there's a lot of infrastructure complexity which is there to ensure that when we're training on data from years ago, that we're not using data that would be in the future from this point (future data leaking into the past).

Using Dolt, as far as I understand it, we could simply set the DB to a point in the past where the 'future' data wasn't available. Very cool

replies(5): >>31847959 #>>31848014 #>>31849805 #>>31849874 #>>31859003 #

1. lichtenberger ◴[23 Jun 22 14:49 UTC] No.31849874[source]▶

>>31847838 #

Basically my research project[1] I'm working on in my spare time is all about versioning and efficiently storing small sized revisions of the data as well as allowing sophisticated time travel queries for audits and analysis.

Of course all secondary user-defined, typed indexes are also versioned.

Basically the technical idea is to map a huge tree of index tries (with revisions as indexed leave pages at the top-level and a document index as well as secondary indexes on the second level) to an append-only file. To reduce write amplification and to reduce the size of each snapshot data pages are first compressed and second versioned through a sliding snapshot algorithm. Thus, Sirix does not simply do a copy on write per page. Instead it writes nodes, which have been changed in the current revision plus nodes which fall out of the sliding window (therefore it needs a fast random-read drive).

[1] https://github.com/sirixdb/sirix

replies(1): >>31853257 #

2. awmarthur ◴[23 Jun 22 18:27 UTC] No.31853257[source]▶

>>31849874 (TP) #

That sounds somewhat similar to Dolt's storage index structure: Prolly Trees https://www.dolthub.com/blog/2020-04-01-how-dolt-stores-tabl...

↑