(cerebras.ai)

427 points benchmarkist | 3 comments | 19 Nov 24 00:15 UTC | HN request time: 0s | source

1. aurareturn ◴[19 Nov 24 04:58 UTC] No.42180200[source]▶

Normally, I don't think 1000 tokens/s is that much more useful than 50 tokens/s.

However, given that CoT makes models a lot smarter, I think Cerebras chips will be in huge demand from now on. You can have a lot more CoT runs when the inference is 20x faster.

Also, I assume financial applications such as hedge funds would be buying these things in bulk now.

replies(1): >>42180756 #

2. deadmutex ◴[19 Nov 24 07:11 UTC] No.42180756[source]▶

>>42180200 (TP) #

> Also, I assume financial applications such as hedge funds would be buying these things in bulk now.

Please elaborate.. why?

replies(1): >>42182264 #

3. aurareturn ◴[19 Nov 24 11:24 UTC] No.42182264[source]▶

>>42180756 #

I'm assuming hedge funds are using LLMs to dissect information from company news, SEC reports as soon as possible then make a decision on trading. Having faster inference would be a huge advantage.

↑

Llama 3.1 405B now runs at 969 tokens/s on Cerebras Inference