GGML – AI at the Edge

(ggml.ai)

Show context

nivekney ◴[06 Jun 23 17:21 UTC] No.36216106[source]▶

>>36215651 (OP) #

On a similar thread, how does it compare to Hippoml?

Context: https://news.ycombinator.com/item?id=36168666

replies(1): >>36216469 #

brucethemoose2 ◴[06 Jun 23 17:45 UTC] No.36216469[source]▶

>>36216106 #

We don't necessarily know... Hippo is closed source for now.

Its comparable to Apache TVM's vulkan in speed on cuda, see https://github.com/mlc-ai/mlc-llm

But honestly, the biggest advantage of llama.cpp for me is being able to split a model so performantly. My puny 16GB laptop can just barely, but very practically, run LLaMA 30B at almost 3 tokens/s, and do it right now. That is crazy!

replies(1): >>36217701 #

smiley1437 ◴[06 Jun 23 19:14 UTC] No.36217701[source]▶

>>36216469 #

>> run LLaMA 30B at almost 3 tokens/s

Please tell me your config! I have an i9-10900 with 32GB of ram that only gets .7 tokens/s on a 30B model

replies(3): >>36217877 #>>36217992 #>>36219745 #

LoganDark ◴[06 Jun 23 19:26 UTC] No.36217877[source]▶

>>36217701 #

> Please tell me your config! I have an i9-10900 with 32GB of ram that only gets .7 tokens/s on a 30B model

Have you quantized it?

replies(1): >>36218570 #

1. smiley1437 ◴[06 Jun 23 20:19 UTC] No.36218570[source]▶

>>36217877 #

The model I have is q4_0 I think that's 4 bit quantized

I'm running in Windows using koboldcpp, maybe it's faster in Linux?

replies(2): >>36219174 #>>36219792 #

2. LoganDark ◴[06 Jun 23 21:11 UTC] No.36219174[source]▶

>>36218570 (TP) #

> The model I have is q4_0 I think that's 4 bit quantized

That's correct, yeah. Q4_0 should be the smallest and fastest quantized model.

> I'm running in Windows using koboldcpp, maybe it's faster in Linux?

Possibly. You could try using WSL to test—I think both WSL1 and WSL2 are faster than Windows (but WSL1 should be faster than WSL2).

replies(1): >>36220358 #

3. brucethemoose2 ◴[06 Jun 23 22:14 UTC] No.36219792[source]▶

>>36218570 (TP) #

I am running linux with cublast offload, and I am using the new 3 bit quant that was just pulled in a day or two ago.

replies(2): >>36220323 #>>36222560 #

4. smiley1437 ◴[06 Jun 23 23:06 UTC] No.36220323[source]▶

>>36219792 #

Thanks! I'll have to try the 3bit to see if that helps

5. smiley1437 ◴[06 Jun 23 23:08 UTC] No.36220358[source]▶

>>36219174 #

I didn't know what WSL was, but now I do, thanks for the tip!

6. LoganDark ◴[07 Jun 23 03:41 UTC] No.36222560[source]▶

>>36219792 #

cuBLAS or CLBlast? There is no such thing as cublast

↑