←back to thread

1062 points mixto | 2 comments | | HN request time: 0s | source
1. dekhn ◴[] No.42943425[source]
I am not a git fan. After many years (following use of RCS, SCCS, CVS, SVN) I tried it and found that its whole mental model was weird and awkward. I can get around in it but any complicated merge is just painful.

Anyway, the comment I really wanted to make was that I tried git lfs for the first time. I downloaded 44TB (https://huggingface.co/datasets/HuggingFaceFW/fineweb/tree/m...) over 3-4 days which was pretty impressive until I noticed that it seems to double disk space (90TB total). I did a little reading just to confirm it, and even learned a new term "git smudge". double disk space isn't an issue, except when you're using git to download terabytes.

replies(1): >>42947038 #
2. jeroenhd ◴[] No.42947038[source]
Git is absolutely terrible for large files, especially binary files. That's why git LFS rarely ever uses git as a storage mechanism.

I know programmers like everything to be in version control, but AI models and git just aren't compatible.

replies(1): >>42950647 #