Most active commenters

Rover222(3)

Popular/hot comments

>>45066065 #
>>45068200 #

←back to thread

Grok Code Fast 1

(x.ai)

Show context

boole1854 ◴[29 Aug 25 14:21 UTC] No.45064512[source]▶

>>45063559 (OP) #

It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

replies(14): >>45064582 #>>45064587 #>>45064594 #>>45064616 #>>45064622 #>>45064630 #>>45064757 #>>45064772 #>>45064950 #>>45065131 #>>45065280 #>>45065539 #>>45067136 #>>45077061 #

1. jsheard ◴[29 Aug 25 14:28 UTC] No.45064594[source]▶

>>45064512 #

That's far from the worst metric that xAI has come up with...

https://xcancel.com/elonmusk/status/1958854561579638960

replies(1): >>45066065 #

2. Rover222 ◴[29 Aug 25 16:20 UTC] No.45066065[source]▶

>>45064594 (TP) #

what's wrong with rapid updates to an app?

replies(5): >>45067028 #>>45067061 #>>45068102 #>>45069218 #>>45070365 #

3. cosmicgadget ◴[29 Aug 25 17:32 UTC] No.45067028[source]▶

>>45066065 #

They aren't a metric for showing you are better than the competition.

replies(1): >>45068209 #

4. ori_b ◴[29 Aug 25 17:35 UTC] No.45067061[source]▶

>>45066065 #

It's like measuring how fast your car can go by counting how often you clean the upholstery.

There's nothing wrong with doing it, but it's entirely unrelated to performance.

replies(1): >>45068200 #

5. tzs ◴[29 Aug 25 19:05 UTC] No.45068102[source]▶

>>45066065 #

See the reply, currently at #2 on that Twitter thread, from Jamie Voynow.

6. Rover222 ◴[29 Aug 25 19:14 UTC] No.45068200{3}[source]▶

>>45067061 #

I don't think he was saying their release cadence is a direct metric on their model performance. Just that the team iterates and improves the app user experience much more quickly than on other teams.

replies(3): >>45068606 #>>45068692 #>>45070385 #

7. Rover222 ◴[29 Aug 25 19:15 UTC] No.45068209{3}[source]▶

>>45067028 #

It's a metric for showing you can move more quickly on product improvements. Anyone who has worked on a product team at a large tech company knows how much things get slowed down by process bloat.

8. jdiff ◴[29 Aug 25 19:50 UTC] No.45068606{4}[source]▶

>>45068200 #

He seems to be stating that app release cadence correlates with internal upgrades that correlate with model performance. There is no reason for this to be true. He does not seem to be talking about user experience.

9. ori_b ◴[29 Aug 25 19:59 UTC] No.45068692{4}[source]▶

>>45068200 #

It's a fucking chat. How many times a day do you need to ship an update?

10. LeafItAlone ◴[29 Aug 25 20:47 UTC] No.45069218[source]▶

>>45066065 #

I have a coworker who outshines everybody else in number of commits and pushes in any given time period. It’s pretty amazing the number they can accomplish!

Of course, 95% of them are fixing things they broke in earlier commits and their overall quality is the worst on the team. But, holy cow, they can output crap faster than anyone I’ve seen.

11. kelnos ◴[29 Aug 25 22:59 UTC] No.45070365[source]▶

>>45066065 #

That metric doesn't really tell you anything. Maybe I'm making rapid updates to my app because I'm a terrible coder and I keep having to push out fixes to critical bugs. Maybe I'm bored and keep making little tweaks to the UI, and for some reason think that's worth people's time to upgrade. (And that's another thing: frequent upgrades can be annoying!)

But sure, ok, maybe it could mean making much faster progress than competitors. But then again, it could also mean that competitors have a much more mature platform, and you're only releasing new things so often because you're playing catch-up.

(And note that I'm not specifically talking about LLMs here. This metric is useless for pretty much any kind of app or service.)

12. kelnos ◴[29 Aug 25 23:01 UTC] No.45070385{4}[source]▶

>>45068200 #

Oh c'mon, I know it's usually best to try to interpret things in the most charitable way possible, but clearly Musk was implying the actual meat of things, the model itself, is what's being constantly improved.

But even if your interpretation is correct, frequency of releases still is not a good metric. That could just mean that you have a lot to fix, and/or you keep breaking and fixing things along the way.

↑