←back to thread

Data Version Control

(dvc.org)

213 points shcheklein | 1 comments | 19 Oct 24 16:56 UTC | HN request time: 0.209s | source

Show context

bramathon ◴[19 Oct 24 19:48 UTC] No.41890226[source]▶

>>41888937 (OP) #

I've used DVC for most of my projects for the past five years. The good things is that it works a lot like git. If your scientists understand branches, commits and diffs, they should be able to understand DVC. The bad thing is that it works like git. Scientists often do not, in fact, understand or use branches, commits and diffs. The best thing is that it essentially forces you to follow Ten Simple Rules for Reproducible Computational Research [1]. Reproducibility has been a huge challenge on teams I've worked on.

[1] https://journals.plos.org/ploscompbiol/article?id=10.1371/jo...

replies(1): >>41954310 #

1. bach4ants ◴[26 Oct 24 12:20 UTC] No.41954310[source]▶

I have noticed this as well. There is a huge resistance to learning Git, and I think it's partly warranted. Researchers know what it is, and know that it's valuable, but think it will take too long to learn and they want to move fast. I recently started building a tool called Calkit (https://github.com/calkit/calkit) in an attempt to simply and unify Git and DVC for these types of researchers. Hoping to convince folks that working reproducibly is actually faster in the long run, never mind the fact that it makes their work more directly usable for pushing the field forward more quickly overall.