←back to thread

317 points laserduck | 1 comments | | HN request time: 0.208s | source
1. frizdny5 ◴[] No.42158583[source]
The bottleneck for LLM is fast and large memory, not compute power.

Whoever is recommending investing in better chip(ALU) design hasn't done even a basic analysis of the problem.

Tokens per second = memory bandwidth divided by model size.