/top/
/new/
/best/
/ask/
/show/
/job/
^
slacker news
login
about
←back to thread
Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference
(cerebras.ai)
426 points
benchmarkist
| 5 comments |
19 Nov 24 00:15 UTC
|
HN request time: 0.862s
|
source
Show context
brcmthrowaway
◴[
19 Nov 24 03:03 UTC
]
No.
42179727
[source]
▶
>>42178761 (OP)
#
So out of all AI chip startups, Cerebras is probably the real deal
replies(2):
>>42179835
#
>>42179935
#
1.
icelancer
◴[
19 Nov 24 03:46 UTC
]
No.
42179935
[source]
▶
>>42179727
#
Groq is legitimate. Cerebras so far doesn't scale (wide) nearly as good as Groq. We'll see how it goes.
replies(2):
>>42180141
#
>>42180942
#
ID:
GO
2.
hendler
◴[
19 Nov 24 04:41 UTC
]
No.
42180141
[source]
▶
>>42179935 (TP)
#
Google TPUs, Amazon, a YC funded ASIC/FPGA company, a Chinese Co. all have custom hardware too that might scale well.
3.
throwawaymaths
◴[
19 Nov 24 07:55 UTC
]
No.
42180942
[source]
▶
>>42179935 (TP)
#
How exactly does groq scale wide well? Last I heard it was 9 racks!! to run llama-2 70b
Which is why they throttle your requests
replies(1):
>>42187154
#
4.
pama
◴[
19 Nov 24 19:28 UTC
]
No.
42187154
[source]
▶
>>42180942
#
Well, Cerebras pretty much needs a data center to simply fit the 405B model for inference.
replies(1):
>>42187359
#
5.
throwawaymaths
◴[
19 Nov 24 19:47 UTC
]
No.
42187359
{3}
[source]
▶
>>42187154
#
I guess this just shows the insanity of venture led AI hardware hype and shady startup messaging practices
↑