Data Version Control | slacker news

Mostly consult as a data engineer not ML ops but I’m interested in some aspects of this. We have 10 years of parquet files from 300+ different kafka topic and we’re currently migrating to apache iceberg. We’ll back fill on a need only basis and it would be nice to track that with git. Would this be a good fit for that?

Another potential aspect would be tracking schema evolution in a nicer way than we currently do.

thx in advance, huge fan of anything-as-code and think it’s a great fit for data (20+ years in this area).