Llama.cpp 30B runs with only 6GB of RAM now

1. DoctorOetker ◴[01 Apr 23 14:41 UTC] No.35400653[source]▶

Is there a reason Llama is getting so much attention compared to say T5 11B?

Not sure how neutral or what benchmarks are used on the following link, but T5 seems to sit a lot higher on this leaderboard?

https://accubits.com/large-language-models-leaderboard/

replies(2): >>35400879 #>>35400909 #

2. itake ◴[01 Apr 23 15:09 UTC] No.35400879[source]▶

>>35400653 (TP) #

llama can run on an m1. T5 still needs a specialized gpu

replies(1): >>35401284 #

3. barbariangrunge ◴[01 Apr 23 15:14 UTC] No.35400909[source]▶

>>35400653 (TP) #

Is llama open source? I heard it was pirated from Facebook

replies(1): >>35401300 #

4. DoctorOetker ◴[01 Apr 23 15:57 UTC] No.35401284[source]▶

>>35400879 #

What is the reason T5 needs a specialized GPU and Llama doesn't?

In the end they are mathematical models, so what would prevent someone from loading T5 into a machine with plenty of RAM (like a server)? Would the codebase truly require that much refactoring? How difficult would it be to rewrite the model arhitecture as a set of mathematical equations (Einstein summation) and reimplement inference for CPU?

replies(1): >>35404254 #

5. DoctorOetker ◴[01 Apr 23 15:59 UTC] No.35401300[source]▶

>>35400909 #

I did not claim llama was open source, but I see the url I posted insinuates that (probably for a contorted meaning of open source, as in source available for approved academics).

Anyway, T5 being available for download from Huggingface only makes my question more pertinent...

replies(1): >>35403614 #

6. w4ffl35 ◴[01 Apr 23 20:08 UTC] No.35403614{3}[source]▶

>>35401300 #

I made an app for running t5 locally - compiled version allows you to run without installing anything.

https://capsizegames.itch.io/chat-ai

https://github.com/Capsize-Games/chatai

replies(1): >>35405400 #

7. itake ◴[01 Apr 23 21:28 UTC] No.35404254{3}[source]▶

>>35401284 #

I'm far from an expert in this area. But llama has been updated so anyone can hack with it on their m1 macbook (which many developers have). If someone updated T5 to be as easy to dev against, then I am sure they would see similar community interest.

Most people don't have the hardware or budget to access these specialized high vram GPUs.

8. DoctorOetker ◴[01 Apr 23 23:41 UTC] No.35405400{4}[source]▶

>>35403614 #

interesting, what are the hardware requirements?

does it happen to run on CPU on a server with 96GB RAM?

replies(1): >>35411916 #

9. w4ffl35 ◴[02 Apr 23 15:58 UTC] No.35411916{5}[source]▶

>>35405400 #

the compiled app is meant for people to install and use with their GPU and runs on as low as a GTX 1080. I haven't tested against CPU only builds.

You can take a look at the source code and see if it would be useful to you.