I am not a git fan. After many years (following use of RCS, SCCS, CVS, SVN) I tried it and found that its whole mental model was weird and awkward. I can get around in it but any complicated merge is just painful.
Anyway, the comment I really wanted to make was that I tried git lfs for the first time. I downloaded 44TB (https://huggingface.co/datasets/HuggingFaceFW/fineweb/tree/m...) over 3-4 days which was pretty impressive until I noticed that it seems to double disk space (90TB total). I did a little reading just to confirm it, and even learned a new term "git smudge". double disk space isn't an issue, except when you're using git to download terabytes.