←back to thread

1311 points msoad | 3 comments | | HN request time: 0s | source
Show context
DoctorOetker ◴[] No.35400653[source]
Is there a reason Llama is getting so much attention compared to say T5 11B?

Not sure how neutral or what benchmarks are used on the following link, but T5 seems to sit a lot higher on this leaderboard?

https://accubits.com/large-language-models-leaderboard/

replies(2): >>35400879 #>>35400909 #
1. itake ◴[] No.35400879[source]
llama can run on an m1. T5 still needs a specialized gpu
replies(1): >>35401284 #
2. DoctorOetker ◴[] No.35401284[source]
What is the reason T5 needs a specialized GPU and Llama doesn't?

In the end they are mathematical models, so what would prevent someone from loading T5 into a machine with plenty of RAM (like a server)? Would the codebase truly require that much refactoring? How difficult would it be to rewrite the model arhitecture as a set of mathematical equations (Einstein summation) and reimplement inference for CPU?

replies(1): >>35404254 #
3. itake ◴[] No.35404254[source]
I'm far from an expert in this area. But llama has been updated so anyone can hack with it on their m1 macbook (which many developers have). If someone updated T5 to be as easy to dev against, then I am sure they would see similar community interest.

Most people don't have the hardware or budget to access these specialized high vram GPUs.