(x.ai)

504 points Terretta | 1 comments | 29 Aug 25 13:01 UTC | HN request time: 0s | source

Show context

boole1854 ◴[29 Aug 25 14:21 UTC] No.45064512[source]▶

It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

replies(14): >>45064582 #>>45064587 #>>45064594 #>>45064616 #>>45064622 #>>45064630 #>>45064757 #>>45064772 #>>45064950 #>>45065131 #>>45065280 #>>45065539 #>>45067136 #>>45077061 #

jsheard ◴[29 Aug 25 14:28 UTC] No.45064594[source]▶

>>45064512 #

That's far from the worst metric that xAI has come up with...

https://xcancel.com/elonmusk/status/1958854561579638960

replies(1): >>45066065 #

Rover222 ◴[29 Aug 25 16:20 UTC] No.45066065[source]▶

>>45064594 #

what's wrong with rapid updates to an app?

replies(5): >>45067028 #>>45067061 #>>45068102 #>>45069218 #>>45070365 #

ori_b ◴[29 Aug 25 17:35 UTC] No.45067061{3}[source]▶

>>45066065 #

It's like measuring how fast your car can go by counting how often you clean the upholstery.

There's nothing wrong with doing it, but it's entirely unrelated to performance.

replies(1): >>45068200 #

Rover222 ◴[29 Aug 25 19:14 UTC] No.45068200{4}[source]▶

>>45067061 #

I don't think he was saying their release cadence is a direct metric on their model performance. Just that the team iterates and improves the app user experience much more quickly than on other teams.

replies(3): >>45068606 #>>45068692 #>>45070385 #

1. ori_b ◴[29 Aug 25 19:59 UTC] No.45068692{5}[source]▶

>>45068200 #

It's a fucking chat. How many times a day do you need to ship an update?

↑

Grok Code Fast 1