←back to thread

504 points Terretta | 4 comments | | HN request time: 0s | source
Show context
boole1854 ◴[] No.45064512[source]
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

replies(14): >>45064582 #>>45064587 #>>45064594 #>>45064616 #>>45064622 #>>45064630 #>>45064757 #>>45064772 #>>45064950 #>>45065131 #>>45065280 #>>45065539 #>>45067136 #>>45077061 #
jsheard ◴[] No.45064594[source]
That's far from the worst metric that xAI has come up with...

https://xcancel.com/elonmusk/status/1958854561579638960

replies(1): >>45066065 #
Rover222 ◴[] No.45066065[source]
what's wrong with rapid updates to an app?
replies(5): >>45067028 #>>45067061 #>>45068102 #>>45069218 #>>45070365 #
ori_b ◴[] No.45067061[source]
It's like measuring how fast your car can go by counting how often you clean the upholstery.

There's nothing wrong with doing it, but it's entirely unrelated to performance.

replies(1): >>45068200 #
1. Rover222 ◴[] No.45068200[source]
I don't think he was saying their release cadence is a direct metric on their model performance. Just that the team iterates and improves the app user experience much more quickly than on other teams.
replies(3): >>45068606 #>>45068692 #>>45070385 #
2. jdiff ◴[] No.45068606[source]
He seems to be stating that app release cadence correlates with internal upgrades that correlate with model performance. There is no reason for this to be true. He does not seem to be talking about user experience.
3. ori_b ◴[] No.45068692[source]
It's a fucking chat. How many times a day do you need to ship an update?
4. kelnos ◴[] No.45070385[source]
Oh c'mon, I know it's usually best to try to interpret things in the most charitable way possible, but clearly Musk was implying the actual meat of things, the model itself, is what's being constantly improved.

But even if your interpretation is correct, frequency of releases still is not a good metric. That could just mean that you have a lot to fix, and/or you keep breaking and fixing things along the way.