←back to thread

1311 points msoad | 1 comments | | HN request time: 0s | source
Show context
kccqzy ◴[] No.35395739[source]
I might be missing something but I actually couldn't reproduce. I purposefully chose a computer with 16GiB RAM to run the 30B model. Performance was extremely slow, and the process was clearly not CPU-limited, unlike when it's running the 13B model. It's clearly swapping a lot.
replies(5): >>35396367 #>>35396552 #>>35396848 #>>35398023 #>>35398479 #
freehorse ◴[] No.35396552[source]
Same, performance of the quantised 30B model on my m1 16GB air is absolutely terrible. A couple of things I noticed on activity monitor: 1. "memory used" + "cached files" == 16GB (while swap is zero) 2. Disk reading is 500-600MB/s 3. it seems that every token is computed exactly _after every ~20GB read from disk_ which actually points that for calculating each token it actually re-reads the weights file again (instead of caching it). I actually suspect that swapping may have been more efficient.

The last part (3) that it rereads the whole file again is an assumption and it could just be a coincidence that the new token is computed at every ~20GB read from disk, but it makes sense, as I do not think swapping would have been that inefficient.

replies(1): >>35399836 #
muyuu ◴[] No.35399836[source]
Can you share the intermediate files? They're taking ages to process on my 16GB-RAM laptop
replies(1): >>35400111 #
freehorse ◴[] No.35400111{3}[source]
Which files are you referring to exactly?
replies(1): >>35400360 #
muyuu ◴[] No.35400360{4}[source]
ggml-model-f16.bin and ggml-model-q4_0.bin

those are the output of convert-pth-to-ggml.py and quantize respectively

I had to cancel 30B as I needed to use the computer after some 12 hours, now I have to fix the ext4 filesystem of the drive where I was doing it, fun times for the weekend

guess I'll settle for 13B, I was using 7B but the results are pretty lousy compared to GPT4all's Lora, let alone GPT3.5-turbo or better

I'll give a shot to quantising 13B, I'm on 16GB of RAM locally

replies(1): >>35400877 #
1. dekhn ◴[] No.35400877{5}[source]
Yeah, the first time I ran the 30B model, it crashed my machine and I had to reinstall from scratch (linux).