Most active commenters
  • Rover222(3)

←back to thread

504 points Terretta | 12 comments | | HN request time: 0s | source | bottom
Show context
boole1854 ◴[] No.45064512[source]
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

replies(14): >>45064582 #>>45064587 #>>45064594 #>>45064616 #>>45064622 #>>45064630 #>>45064757 #>>45064772 #>>45064950 #>>45065131 #>>45065280 #>>45065539 #>>45067136 #>>45077061 #
1. jsheard ◴[] No.45064594[source]
That's far from the worst metric that xAI has come up with...

https://xcancel.com/elonmusk/status/1958854561579638960

replies(1): >>45066065 #
2. Rover222 ◴[] No.45066065[source]
what's wrong with rapid updates to an app?
replies(5): >>45067028 #>>45067061 #>>45068102 #>>45069218 #>>45070365 #
3. cosmicgadget ◴[] No.45067028[source]
They aren't a metric for showing you are better than the competition.
replies(1): >>45068209 #
4. ori_b ◴[] No.45067061[source]
It's like measuring how fast your car can go by counting how often you clean the upholstery.

There's nothing wrong with doing it, but it's entirely unrelated to performance.

replies(1): >>45068200 #
5. tzs ◴[] No.45068102[source]
See the reply, currently at #2 on that Twitter thread, from Jamie Voynow.
6. Rover222 ◴[] No.45068200{3}[source]
I don't think he was saying their release cadence is a direct metric on their model performance. Just that the team iterates and improves the app user experience much more quickly than on other teams.
replies(3): >>45068606 #>>45068692 #>>45070385 #
7. Rover222 ◴[] No.45068209{3}[source]
It's a metric for showing you can move more quickly on product improvements. Anyone who has worked on a product team at a large tech company knows how much things get slowed down by process bloat.
8. jdiff ◴[] No.45068606{4}[source]
He seems to be stating that app release cadence correlates with internal upgrades that correlate with model performance. There is no reason for this to be true. He does not seem to be talking about user experience.
9. ori_b ◴[] No.45068692{4}[source]
It's a fucking chat. How many times a day do you need to ship an update?
10. LeafItAlone ◴[] No.45069218[source]
I have a coworker who outshines everybody else in number of commits and pushes in any given time period. It’s pretty amazing the number they can accomplish!

Of course, 95% of them are fixing things they broke in earlier commits and their overall quality is the worst on the team. But, holy cow, they can output crap faster than anyone I’ve seen.

11. kelnos ◴[] No.45070365[source]
That metric doesn't really tell you anything. Maybe I'm making rapid updates to my app because I'm a terrible coder and I keep having to push out fixes to critical bugs. Maybe I'm bored and keep making little tweaks to the UI, and for some reason think that's worth people's time to upgrade. (And that's another thing: frequent upgrades can be annoying!)

But sure, ok, maybe it could mean making much faster progress than competitors. But then again, it could also mean that competitors have a much more mature platform, and you're only releasing new things so often because you're playing catch-up.

(And note that I'm not specifically talking about LLMs here. This metric is useless for pretty much any kind of app or service.)

12. kelnos ◴[] No.45070385{4}[source]
Oh c'mon, I know it's usually best to try to interpret things in the most charitable way possible, but clearly Musk was implying the actual meat of things, the model itself, is what's being constantly improved.

But even if your interpretation is correct, frequency of releases still is not a good metric. That could just mean that you have a lot to fix, and/or you keep breaking and fixing things along the way.