(cerebras.ai)

427 points benchmarkist | 1 comments | 19 Nov 24 00:15 UTC | HN request time: 0.246s | source

Show context

LASR ◴[19 Nov 24 02:25 UTC] No.42179539[source]▶

What you can do with current-gen models, along with RAG, multi-agent & code interpreters, the wall is very much model latency, and not accuracy any more.

There are so many interactive experiences that could be made possible at this level of token throughput from 405B class models.

replies(2): >>42179814 #>>42191188 #

1. TeeWEE ◴[20 Nov 24 06:03 UTC] No.42191188[source]▶

>>42179539 #

How can a rule book help fixing incidents. I mean I hope every incident is novel. Since you solve the root issue. So every time you need to dig in the code, or recently deployed code and correlate it with your production metrics.

Or is the rulebook a simple rollback?

↑

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference