(cerebras.ai)

427 points benchmarkist | 3 comments | 19 Nov 24 00:15 UTC | HN request time: 0.68s | source

1. gorkempacaci ◴[19 Nov 24 08:27 UTC] No.42181139[source]▶

nvidia hates this one little trick

2. zurfer ◴[19 Nov 24 08:33 UTC] No.42181177[source]▶

I laughed and upvoted, but if anything I bet they put their best people on it to replicate this offering.

What I take away from this is: we are just getting started. I remember in 2023 begging OpenAI to give us more than 7 tokens/second on GPT-4.

replies(1): >>42191224 #

3. ryao ◴[20 Nov 24 06:12 UTC] No.42191224[source]▶

>>42181177 #

Nvidia’s target is performance across concurrent users and they are likely already outperforming Cerebras there as far as costs are concerned. They have no reason to try to beat the single user performance of this.

↑

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference