←back to thread

Dolt is Git for data

(www.dolthub.com)
358 points timsehn | 1 comments | | HN request time: 0.212s | source
Show context
quickthrower2 ◴[] No.22734205[source]
I think data (as in raw, collected / measured / surveyed data) doesn't really change, but you get more of it. Some data may occasionally supersede old data. Maybe the schema of the data changes, so your first set of data is in one form, and subsequent data might have more information, or recorded in a different way.
replies(2): >>22734268 #>>22734428 #
jon_richards ◴[] No.22734268[source]
One really important feature of time series data is the preservation of what the dataset looked like at each point in time. Financial data providers will make a mistake (off by order of magnitude, missed a stock split, etc) and then go back and correct it. This means you end up training models entirely on corrected data, but trade based on uncorrected data.
replies(2): >>22736968 #>>22742107 #
1. quickthrower2 ◴[] No.22742107[source]
Thanks, I didn't consider training of models, this is a great use case for a tool.