Most active commenters
  • tptacek(8)
  • csomar(4)
  • beckthompson(3)
  • Shorel(3)

←back to thread

627 points cratermoon | 36 comments | | HN request time: 0.898s | source | bottom
1. tptacek ◴[] No.44461381[source]
LLM output is crap. It’s just crap. It sucks, and is bad.

Still don't get it. LLM outputs are nondeterministic. LLMs invent APIs that don't exist. That's why you filter those outputs through agent constructions, which actually compile code. The nondeterminism of LLMs don't make your compiler nondeterministic.

All sorts of ways to knock LLM-generated code. Most I disagree with, all colorable. But this article is based on a model of LLM code generation from 6 months ago which is simply no longer true, and you can't gaslight your way back to Q1 2024.

replies(7): >>44461418 #>>44461426 #>>44461474 #>>44461544 #>>44461933 #>>44461994 #>>44463037 #
2. beckthompson ◴[] No.44461418[source]
AIs still frequently make up stuff up - there isn't really a way to get out of that. Have they improved a lot in the last six months? 100%! But they still make mistakes its quite common
replies(2): >>44461516 #>>44461572 #
3. 62702b077f3 ◴[] No.44461426[source]
> The garbage generator generates garbage, but if you run it enough times it gets something slightly-less-garbage that can satisfy a compiler! You're stupid if you don't think this is awesome!
replies(4): >>44461513 #>>44461517 #>>44461643 #>>44462226 #
4. ipdashc ◴[] No.44461474[source]
There was an article that made the rounds a few weeks ago that still rings true. Basically, it feels like one is going crazy reading either "extreme" of the whole LLM conversation, with one extreme (obviously) being the "AI can do anything" Twitter techbro types, but the other extreme being articles like this that claim it can't do anything.

I know the author already addressed this, literally calling out HN by name, but I just don't get it. You don't even need agents (though I'm sure they help), I still just use regular ChatGPT or Copilot or whatever and it's still occasionally useful. You type in what you want it to do, it gives you code, and usually the code works. Can we appreciate how insane this would have been, what, half a decade ago? Are our standards literally "the magic english-to-code machine doesn't work 100% of the time, so it's total crap, utterly useless"?

I absolutely agree with the general thrust of the article, the overall sense of disillusionment, the impact LLM abuse is going to have on education, etc. I don't even particularly like LLMs. But it really does feel like gaslighting to the extent that when these essays make this sort of argument (LLMs being entirely useless for coding) it just makes me take them less seriously.

replies(1): >>44461503 #
5. paulddraper ◴[] No.44461503[source]
> it just makes me take them less seriously

Indeed. This is how to spot an ideologue with an axe to grind, not someone whose beliefs are shaped by dispassionate observation.

replies(1): >>44461749 #
6. KPGv2 ◴[] No.44461513[source]
I had Copilot for a hot minute. When I wrote things like serializers and deserializers, it was incredible. So much time saved. But I didn't do it enough to make the personal cost worth it, so I cancelled.

It's annoying to have to hand-code that stuff. But without Copilot I have to. Or I can write some arcane regex and run it on existing code to get 90% of the way there. But writing the regex also takes a while.

Copilot was literally just suggesting the whole deserialization function after I"d finished the serializer, 100% correct code.

replies(1): >>44462291 #
7. tptacek ◴[] No.44461516[source]
LLM calls make stuff up. Your compiler can't make things up. An agent iterates LLM calls. When your LLM call makes an API up, your compiler will generate errors. The errors get fed back into the iterative loop. In pretty much ever real case, the LLM corrects, but either way: the result is clear. The code may be wrong, but it shouldn't hallucinate entire APIs.
replies(2): >>44461543 #>>44461587 #
8. ipdashc ◴[] No.44461517[source]
I don't understand the point of this style of argument.

There are oh-so-many issues with LLMs - plagiarism/IP rights, worsening education, unmaintainable code - this should be obvious to anyone. But painting them as totally useless just doesn't make sense. Of course they work. I've had a task I want to do, I ask the LLM in plain English, it gives me code, the code works, I get the task done faster than I would have figuring out the code myself. This process has happened plenty of times.

Which part of this do you disagree with, under your argument? Am I and all the other millions of people who have experienced this all collectively hallucinating (pun intended) that we got working solutions to our problems? Are we just unskilled for not being able to write the code quickly enough ourselves, and should go sod off? I'm joking a bit, but it's a genuine question.

9. beckthompson ◴[] No.44461543{3}[source]
But just compiling doesn't mean that much and doesn't really solve the core issue of AIs making stuff up. I could hook up a random word generator into a compiler and it also would also pass that test!

For example, just yesterday I asked an AI a question about how to approach a specific problem. It gave an answer that "worked" (it compiled!) but in reality it didn't really make any sense and would add a very nasty bug. What it wrote (It used a FrameUpdate instead of a normal Update) just didn't make sense on a basic level of how the framework worked.

replies(1): >>44461564 #
10. csomar ◴[] No.44461544[source]
> LLM outputs are nondeterministic.

LLM outputs are deterministic. There is no intrinsic source of randomness. Users can add randomness (temperature) to the output and modify it.

> But this article is based on a model of LLM code generation from 6 months ago

There hasn't been much change in models from 6 months ago. What happened is that we have better tooling to sift through the randomly generated outputs.

I don't disagree with your message. You are being downvoted because a lot of software developers are butt-hurt by it. It is going to force a change in the labor market for developers. In the same way the author is butt-hurt that they didn't buy Bitcoin in the very early days (as they were aware of it) and missed the boat on that.

replies(2): >>44461557 #>>44461746 #
11. tptacek ◴[] No.44461557[source]
There hasn't been much change in models from 6 months ago.

I made the same claim in a widely-circulated piece a month or so back, and have come to believe it was wildly false, the dumbest thing I said in that piece.

replies(1): >>44461604 #
12. tptacek ◴[] No.44461564{4}[source]
I'm not interested in this Calvinball argument. The post we're commenting on makes a clear claim: an LLM hallucinating entire APIs. Not surreptitiously sneaking subtly shitty stuff past a compiler.

This is my problem: not that people are cynical about LLM-assisted coding, but that they themselves are hallucinating arguments about it, expecting their readers to nod along. Not happening here.

replies(1): >>44461916 #
13. csomar ◴[] No.44461572[source]
You can improve on that

1. A type-strict compiler.

2. https://github.com/isaacphi/mcp-language-server

LLMs will always make stuff up because they are lossy. In the same way that if I ask you to list the methods for some random object lib you'd not be able to do that; you use the documentation to pull that up or your code-complete companion. LLMs are just getting the tools for that.

replies(1): >>44461611 #
14. loire280 ◴[] No.44461587{3}[source]
A great solution to this problem, but it doesn't seem like this approach will generalize to problems in other fields, or even to more suble coding confabulations that can't be detected by the compiler or static analysis.
replies(1): >>44461608 #
15. csomar ◴[] No.44461604{3}[source]
I have my own test to measure performance: https://omarabid.com/gpt3-now

So far the only model that showed significant advancement and differentiation was GPT-4.5. I advise to look at the problem and read GPT-4.5 answer. It'll show the difference to other "normal models" (including GPT-3.5) as it shows considerable levels of understanding.

Other normal models are now more chatty and have a bit more data. But they do not show increased intelligence.

replies(1): >>44462105 #
16. tptacek ◴[] No.44461608{4}[source]
I vehemently agree with this. But it doesn't change the falsity of the claim in the article.
17. beckthompson ◴[] No.44461611{3}[source]
Oh for sure I agree 100%! I was just saying that they will always make stuff up no matter what. Those are both good fixes but at its core it can only "make stuff up".
18. literalAardvark ◴[] No.44461643[source]
Counter point: the brain also generates mostly garbage, just slower.
19. reasonableklout ◴[] No.44461746[source]
Nit: in practice, even at temperature 0, production LLM implementations have some non-determinism. One reason is because many floating point computations are technically non-commutative even when the mathematical operation is, and the order can vary if they are carried out in parallel by the GPU. For example, see: https://www.twosigma.com/articles/a-workaround-for-non-deter...
replies(1): >>44461959 #
20. happytoexplain ◴[] No.44461749{3}[source]
Aside from trivial facts, beliefs can not be, and should not be, shaped by dispassionate observation alone. Even yours are not. And framing it the way you have is simply the same but oppositely-positioned fallacy as the one the author is accused of.
21. kgwgk ◴[] No.44461916{5}[source]
> The post we're commenting on makes a clear claim: an LLM hallucinating entire APIs

You made a similar claim: LLMs invent APIs that don't exist

https://news.ycombinator.com/item?id=44461381

replies(1): >>44461932 #
22. tptacek ◴[] No.44461932{6}[source]
The AES block cipher core: also grievously insecure if used naively, without understanding what a block cipher can and can't do, by itself. Thus also an LLM call.
23. lovich ◴[] No.44461933[source]
> agent constructions

> But this article is based on a model of LLM code generation from 6 months ago which is simply no longer true, and you can't gaslight your way back to Q1 2024.

You’re ahead of the curve and wondering why others don’t know what you do. If you’re not an AI company, a faang, or an AI evangelist you likely haven’t heard of those solutions.

I’ve been trying to keep up with AI developments, and only learned about MCP and agentic workflows 1-2 months ago and I consider myself failing at keeping up with cutting edge AI development

Edit:

Also six months ago is Q1 2025, not 2024. Not sure if that was a typo or a need to remind you at how rapidly this technology is iterating

24. jkhdigital ◴[] No.44461959{3}[source]
I ran into this a bit while working on my PhD research that used LLMs for steganography. The output had to be deterministic to reverse the encoding, and it was—as long as you used the same hardware. Encoding a message on a PC and then trying to decode on a phone broke everything.
25. eviks ◴[] No.44461994[source]
> or they suggested an elaborate and tedious workaround that would technically solve the problem (but introduce new ones).

There is no value in randomly choosing an API that exists. There is value in choosing an API that works.

When LLM makes up an API that doesn't even exist it indicates that it's not tied to the reality of the task of fetching a working API, so filtering out nonexistent APIs will not help the other results match better. But yes, they'll compile.

replies(1): >>44462082 #
26. tptacek ◴[] No.44462082[source]
Give me a break. First, that's not the claim the article makes. Second, that's not the experience of anybody who actually uses Claude Code or Gemini Desktop or whatever people are using this week. This is what I'm talking about: people just gaslighting.

LLMs can write truly shitty code. I have a <50% hit rate on stuff I don't have to rewrite, with Sketch.dev, an agent I like a lot. But they don't fail the way you or this article claim they do. Enough.

replies(1): >>44462228 #
27. Karrot_Kream ◴[] No.44462105{4}[source]
I was able to have Opus 4 one-shot it. Happy to share a screenshot if that wasn't your experience.
replies(1): >>44463520 #
28. Shorel ◴[] No.44462226[source]
You are right about this.

Also, someone mathematically proved that's enough. And then someone else proved it empirically.

There was an experiment where they trained 16 pigeons to detect cancerous or benign tumours from photographies.

Individually, each pigeon had an average 85% accuracy. But all pigeons (except for one outlier) together had an accuracy of 99%.

If you add enough silly brains, you get one super smart brain.

replies(1): >>44462264 #
29. eviks ◴[] No.44462228{3}[source]
First, it is, you've just reduced the forest article claim down to a single tree to make it appear like your "solution" cuts it.

Second, speak for yourself, you have no clue about everybody's experience to make such a universal claim.

Lastly, the article talks about author's experience, not yours, so you're the only one who can gaslight the author, not the other way around

replies(1): >>44462236 #
30. tptacek ◴[] No.44462236{4}[source]
I'm comfortable with the assertions I'm making here and stand by them.
31. Lariscus ◴[] No.44462264{3}[source]
Its also mathematically proven that infinite monkeys typing on typewriters for eternity will recreate all works of Shakespeare. It still takes someone with an actual brain to recognize the correct output.
replies(1): >>44462302 #
32. Shorel ◴[] No.44462291{3}[source]
I remember writing LISP code that created the serialisers and deserialisers for me.

Now that everything is containerised and managed by docker style environments, I am thinking about giving SBCL another try, the end users only need to access the same JSON REST APIs anyway.

Everything old is new again =)

33. Shorel ◴[] No.44462302{4}[source]
Yep, there's some positive feedback loop missing in all these LLMs stuff.
34. d4rkn0d3z ◴[] No.44463037[source]
Isn't this saying "We get it that our nondeterministic bullshit machine writes crap, but we are wrapping it in a deterministic finite state machine in order to bring back determinism. We call it 'Agentic'".

Seems like 40 years of effort making deterministic computing work in a non-deterministic universe is being cast aside because we thought nondeterminism might work better. Turns out, we need determinism after all.

Following this out, we might end up with alternating layers of determinism and nondeterminism each trying to correct the output of the layer below.

I would argue AI is a harder problem than any humans have ever tried to solve, how does it benefit me to make every mundane problem into the hardest problem ever? As they say on the internet ...and now you have two problems, the second of which is always the hardest one ever.

35. csomar ◴[] No.44463520{5}[source]
Interested to see your Opus 4 one-shot. I tried it very recently on Opus 4 and it burbled non-sense.
replies(1): >>44478883 #
36. Karrot_Kream ◴[] No.44478883{6}[source]
Sorry for the delay, I'm out for the weekend I'll hey you it tomorrow!