Most active commenters
  • giancarlostoro(3)

←back to thread

504 points Terretta | 19 comments | | HN request time: 0s | source | bottom
Show context
boole1854 ◴[] No.45064512[source]
It's interesting that the benchmark they are choosing to emphasize (in the one chart they show and even in the "fast" name of the model) is token output speed.

I would have thought it uncontroversial view among software engineers that token quality is much important than token output speed.

replies(14): >>45064582 #>>45064587 #>>45064594 #>>45064616 #>>45064622 #>>45064630 #>>45064757 #>>45064772 #>>45064950 #>>45065131 #>>45065280 #>>45065539 #>>45067136 #>>45077061 #
1. eterm ◴[] No.45064582[source]
It depends how fast.

If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Ad absurdum, if it could injest and work on an entire project in milliseconds, then it has mucher geater value to me, than a process which might take a day to do the same, even if the likelihood of success is also strongly affected.

It simply enables a different method of interactive working.

Or it could supply 3 different suggestions in-line while working on something, rather than a process which needs to be explicitly prompted and waited on.

Latency can have critical impact on not just user experience but the very way tools are used.

Now, will I try Grok? Absolutely not, but that's a personal decision due to not wanting anything to do with X, rather than a purely rational decision.

replies(3): >>45064736 #>>45064784 #>>45064870 #
2. postalcoder ◴[] No.45064736[source]
Besides being a faster slot machine, to the extent that they're any good, a fast agentic LLM would be very nice to have for codebase analysis.
replies(1): >>45067357 #
3. giancarlostoro ◴[] No.45064784[source]
> If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Asking any model to do things in steps is usually better too, as opposed to feeding it three essays.

replies(1): >>45064995 #
4. 34679 ◴[] No.45064870[source]
>If an LLM is often going to be wrong anyway, then being able to try prompts quickly and then iterate on those prompts, could possibly be more valuable than a slow higher quality output.

Before MoE was a thing, I built what I called the Dictator, which was one strong model working with many weaker ones to achieve a similar result as MoE, but all the Dictator ever got was Garbage In, so guess what came out?

replies(3): >>45065169 #>>45068763 #>>45073448 #
5. ffsm8 ◴[] No.45064995[source]
I thought the current vibe was doing the former to produce the latter and then use the output as the task plan?
replies(1): >>45065164 #
6. giancarlostoro ◴[] No.45065164{3}[source]
I don't know what other people are doing, I mostly use LLMs:

* Scaffolding

* Ask it what's wrong with the code

* Ask it for improvements I could make

* Ask it what the code does (amazing for old code you've never seen)

* Ask it to provide architect level insights into best practices

One area where they all seem to fail is lesser known packages they tend to either reference old functionality that is not there anymore, or never was, they hallucinate. Which is part of why I don't ask it for too much.

Junie did impress me, but it was very slow, so I would love to see a version of Junie using this version of Grok, it might be worthwhile.

replies(3): >>45067042 #>>45067401 #>>45067478 #
7. _kb ◴[] No.45065169[source]
You just need to scale out more. As you approach infinite monkeys, sorry - models, you'll surely get the result you need.
replies(1): >>45067012 #
8. dingnuts ◴[] No.45067012{3}[source]
why's this guy getting downvoted? SamA says we need a Dyson Sphere made of GPUs surrounding the solar system and people take it seriously but this guy takes a little piss out of that attitude and he's downvoted?

this site is the fucking worst

replies(1): >>45070318 #
9. dingnuts ◴[] No.45067042{4}[source]
> amazing for old code you've never seen

not if you have too much! a few hundred thousand lines of code and you can't ask shit!

plus, you just handed over your company's entire IP to whoever hosts your model

replies(2): >>45067425 #>>45068396 #
10. fmbb ◴[] No.45067357[source]
For 10% less time you can get 10% worse analysis? I don’t understand the tradeoff.
replies(1): >>45070324 #
11. miohtama ◴[] No.45067401{4}[source]
I hope in the future tooling and MCP will be better so agents can directly check what functionality exists in the installed package version instead of hallucinations.
12. miohtama ◴[] No.45067425{5}[source]
It's a fair trade off for smaller companies where IP or the software is necessary evil, not the main unique value added. It's hard to see what evil would anyone do with crappy legacy code.

The IP risks taken may be well worth of productiviry boosts.

13. ffsm8 ◴[] No.45067478{4}[source]
> Ask it what's wrong with the code

That's phase 1, ask it to "think deeply" (Claude keyword, only works with the anthropic models) while doing that. Then ask it to make a detailed plan of solving the issue and write that into current-fix.md and ask it to add clearly testable criteria when the issuen is solved.

Now you manually check the criteria wherever they sound plausible, if not - it's analysis failed and its output was worthless.

But if it sounds good, you can then start a new session and ask it to read the-markdown-file and implement the change.

Now you can plausibility check the diff and are likely done

But as the sister comment pointed out, agentic coding really breaks apart with large files like you usually have in brownfield projects.

14. giancarlostoro ◴[] No.45068396{5}[source]
If Apple keeps improving things, you can run the model locally. I'm able to run models on my Macbook with an M4 that I can't even run on my 3080 GPU (mostly due to VRAM constraints) but they run reasonably fast, would the 3080 be faster? Sure, but its also plenty fast to where I'm not sitting there waiting longer than I wait for a cloud model to "reason" and look things up.

I think the biggest thing for offline LLMs will have to be consistency for having them search the web with an API like Google's or some other search engines API, maybe Kagi could provide an API for people who self-host LLMs (not necessarily for free, but it would still be useful).

15. charcircuit ◴[] No.45068763[source]
That doesn't seem similar to MoE at all.
replies(1): >>45073958 #
16. kelnos ◴[] No.45070318{4}[source]
Maybe because this site is full of people with differing opinions and stances on things, and react differently to what people say and do?

Not sure who was taking SamA seriously about that; personally I think he's a ridiculous blowhard, and statements like that just reinforce that view for me.

Please don't make generalizations about HN's visitors'/commenters' attitudes on things. They're never generally correct.

17. kelnos ◴[] No.45070324{3}[source]
I mean, if that's literally what the numbers are, sure, maybe that's not great. But what if it's 10% less time and 3% worse analysis? Maybe that's valuable.
18. LinXitoW ◴[] No.45073448[source]
Sounds more like a Mixture of Idiots.
19. 34679 ◴[] No.45073958{3}[source]
Well, I really didn't provide sufficient detail to make that determination either way.