I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.
If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.
Ad absurdum, if it could injest and work on an entire project in milliseconds, then it has mucher geater value to me, than a process which might take a day to do the same, even if the likelihood of success is also strongly affected.
It simply enables a different method of interactive working.
Or it could supply 3 different suggestions in-line while working on something, rather than a process which needs to be explicitly prompted and waited on.
Latency can have critical impact on not just user experience but the very way tools are used.
Now, will I try Grok? Absolutely not, but that's a personal decision due to not wanting anything to do with X, rather than a purely rational decision.
They reduce the costs tough !
For autocompleting simple functions (string manipulation, function definitions, etc), the quality bar is pretty easy to hit, and speed is important.
If you're just vibe coding, then yeah, you want quality. But if you know what you're doing, I find having a dumber fast model is often nicer than a slow smart model that you still need to correct a bit, because it's easier to stay in flow state.
With the slow reasoning models, the workflow is more like working with another engineer, where you have to review their code in a PR
I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds. I will usually have eyeballed the code somewhere in the middle here but I'm not fully reviewing until this whole dance is done.
I mean, I obviously agree with you in that I've chosen the slowest models available at every turn here, but my point is I would be very excited if they also got faster because I am using a lot of extra inference to buy more quality before I'm touching the code myself.
Asking any model to do things in steps is usually better too, as opposed to feeding it three essays.
Before MoE was a thing, I built what I called the Dictator, which was one strong model working with many weaker ones to achieve a similar result as MoE, but all the Dictator ever got was Garbage In, so guess what came out?
> I use Opus 4.1 exclusively in Claude Code but then I also use zen-mcp server to get both gpt5 and gemini-2.5-pro to review the code and then Opus 4.1 responds.
I'd love to hear how you have this set up.It's not long enough for you to context switch to something else, but long enough to be annoying and these wait times add up during the whole day.
It also discourages experimentation if you know that every prompt will potentially take multiple minutes to finish. If it instead finished in seconds then you could iterate faster. This would be especially valuable in the frontend world where you often tweak your UI code many times until you're satisfied with it.
* Scaffolding
* Ask it what's wrong with the code
* Ask it for improvements I could make
* Ask it what the code does (amazing for old code you've never seen)
* Ask it to provide architect level insights into best practices
One area where they all seem to fail is lesser known packages they tend to either reference old functionality that is not there anymore, or never was, they hallucinate. Which is part of why I don't ask it for too much.
Junie did impress me, but it was very slow, so I would love to see a version of Junie using this version of Grok, it might be worthwhile.
We already know that in most software domains, fast (as in, getting it done faster) is better than 100% correct.
Different models for different things.
Not everyone is solving complicated things every time they hit cmd-k in Cursor or use autocomplete, and they can easily switch to a different model when working harder stuff out via longer form chat.
this site is the fucking worst
The IP risks taken may be well worth of productiviry boosts.
That's phase 1, ask it to "think deeply" (Claude keyword, only works with the anthropic models) while doing that. Then ask it to make a detailed plan of solving the issue and write that into current-fix.md and ask it to add clearly testable criteria when the issuen is solved.
Now you manually check the criteria wherever they sound plausible, if not - it's analysis failed and its output was worthless.
But if it sounds good, you can then start a new session and ask it to read the-markdown-file and implement the change.
Now you can plausibility check the diff and are likely done
But as the sister comment pointed out, agentic coding really breaks apart with large files like you usually have in brownfield projects.
Often all it takes is to reset to a checkpoint or undo and adjust the prompt a bit with additional context and even dumber models can get things right.
I've used grok code fast plenty this week alongside gpt 5 when I need to pull out the big guns and it's refreshing using a fast model for smaller changes or for tasks that are tedious but repetitive during things like refactoring.
Do you use them successfully in cases where you just had to re-run them 5 times to get a good answer, and was that a better experience than going straight to GPT 5?
I think the biggest thing for offline LLMs will have to be consistency for having them search the web with an API like Google's or some other search engines API, maybe Kagi could provide an API for people who self-host LLMs (not necessarily for free, but it would still be useful).
Of course, 95% of them are fixing things they broke in earlier commits and their overall quality is the worst on the team. But, holy cow, they can output crap faster than anyone I’ve seen.
Not sure who was taking SamA seriously about that; personally I think he's a ridiculous blowhard, and statements like that just reinforce that view for me.
Please don't make generalizations about HN's visitors'/commenters' attitudes on things. They're never generally correct.
But sure, ok, maybe it could mean making much faster progress than competitors. But then again, it could also mean that competitors have a much more mature platform, and you're only releasing new things so often because you're playing catch-up.
(And note that I'm not specifically talking about LLMs here. This metric is useless for pretty much any kind of app or service.)
But even if your interpretation is correct, frequency of releases still is not a good metric. That could just mean that you have a lot to fix, and/or you keep breaking and fixing things along the way.
So the total difference includes the cost of context switching, which is big.
Potentially speed matters less in a scenario that is focused on more autonomous agents running in the background. However I think most usage is still highly interactive these days.