(x.ai)

504 points Terretta | 1 comments | 29 Aug 25 13:01 UTC | HN request time: 0s | source

Show context

boole1854 ◴[29 Aug 25 14:21 UTC] No.45064512[source]▶

It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

replies(14): >>45064582 #>>45064587 #>>45064594 #>>45064616 #>>45064622 #>>45064630 #>>45064757 #>>45064772 #>>45064950 #>>45065131 #>>45065280 #>>45065539 #>>45067136 #>>45077061 #

1. scottyeager ◴[30 Aug 25 18:51 UTC] No.45077061[source]▶

>>45064512 #

Fast inference can change the entire dynamic or working with these tools. At the typical speeds, I usually try to do something else while the model works. When the model works really fast, I can easily wait for it to finish.

So the total difference includes the cost of context switching, which is big.

Potentially speed matters less in a scenario that is focused on more autonomous agents running in the background. However I think most usage is still highly interactive these days.

↑

Grok Code Fast 1