(x.ai)

504 points Terretta | 2 comments | 29 Aug 25 13:01 UTC | HN request time: 0.512s | source

Show context

boole1854 ◴[29 Aug 25 14:21 UTC] No.45064512[source]▶

It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

replies(14): >>45064582 #>>45064587 #>>45064594 #>>45064616 #>>45064622 #>>45064630 #>>45064757 #>>45064772 #>>45064950 #>>45065131 #>>45065280 #>>45065539 #>>45067136 #>>45077061 #

1. ojosilva ◴[29 Aug 25 15:41 UTC] No.45065539[source]▶

>>45064512 #

After trying Cerebras free API (not affiliated) which delivers Qwen Coder 480b and gpt-oss-120b a mind boggling ~3000 tps, that output speed is the first thing I checked out when considering a model for speed. I just wish Cerebras had a better overall offering on their cloud, usage is capped at 70M tokens / day and people are reporting that it's easily hit and highly crippling for daily coding.

replies(1): >>45076951 #

2. scottyeager ◴[30 Aug 25 18:31 UTC] No.45076951[source]▶

>>45065539 (TP) #

They have a "max" plan with 120m tokens/day limit for $200/month: https://www.cerebras.ai/blog/introducing-cerebras-code

↑

Grok Code Fast 1