←back to thread

213 points shcheklein | 1 comments | | HN request time: 0.196s | source
Show context
dmpetrov ◴[] No.41890616[source]
hi there! Maintainer and author here. Excited to see DVC on the front page!

Happy to answer any questions about DVC and our sister project DataChain https://github.com/iterative/datachain that does data versioning with a bit different assumptions: no file copy and built-in data transformations.

replies(3): >>41890932 #>>41896923 #>>41897005 #
1. johanneskanybal ◴[] No.41896923[source]
Mostly consult as a data engineer not ML ops but I’m interested in some aspects of this. We have 10 years of parquet files from 300+ different kafka topic and we’re currently migrating to apache iceberg. We’ll back fill on a need only basis and it would be nice to track that with git. Would this be a good fit for that?

Another potential aspect would be tracking schema evolution in a nicer way than we currently do.

thx in advance, huge fan of anything-as-code and think it’s a great fit for data (20+ years in this area).