←back to thread

504 points Terretta | 1 comments | | HN request time: 0s | source
Show context
boole1854 ◴[] No.45064512[source]
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

replies(14): >>45064582 #>>45064587 #>>45064594 #>>45064616 #>>45064622 #>>45064630 #>>45064757 #>>45064772 #>>45064950 #>>45065131 #>>45065280 #>>45065539 #>>45067136 #>>45077061 #
jsheard ◴[] No.45064594[source]
That's far from the worst metric that xAI has come up with...

https://xcancel.com/elonmusk/status/1958854561579638960

replies(1): >>45066065 #
Rover222 ◴[] No.45066065[source]
what's wrong with rapid updates to an app?
replies(5): >>45067028 #>>45067061 #>>45068102 #>>45069218 #>>45070365 #
ori_b ◴[] No.45067061{3}[source]
It's like measuring how fast your car can go by counting how often you clean the upholstery.

There's nothing wrong with doing it, but it's entirely unrelated to performance.

replies(1): >>45068200 #
Rover222 ◴[] No.45068200{4}[source]
I don't think he was saying their release cadence is a direct metric on their model performance. Just that the team iterates and improves the app user experience much more quickly than on other teams.
replies(3): >>45068606 #>>45068692 #>>45070385 #
1. ori_b ◴[] No.45068692{5}[source]
It's a fucking chat. How many times a day do you need to ship an update?