←back to thread

628 points cratermoon | 10 comments | | HN request time: 0.524s | source | bottom
Show context
tptacek ◴[] No.44461381[source]
LLM output is crap. It’s just crap. It sucks, and is bad.

Still don't get it. LLM outputs are nondeterministic. LLMs invent APIs that don't exist. That's why you filter those outputs through agent constructions, which actually compile code. The nondeterminism of LLMs don't make your compiler nondeterministic.

All sorts of ways to knock LLM-generated code. Most I disagree with, all colorable. But this article is based on a model of LLM code generation from 6 months ago which is simply no longer true, and you can't gaslight your way back to Q1 2024.

replies(7): >>44461418 #>>44461426 #>>44461474 #>>44461544 #>>44461933 #>>44461994 #>>44463037 #
1. beckthompson ◴[] No.44461418[source]
AIs still frequently make up stuff up - there isn't really a way to get out of that. Have they improved a lot in the last six months? 100%! But they still make mistakes its quite common
replies(2): >>44461516 #>>44461572 #
2. tptacek ◴[] No.44461516[source]
LLM calls make stuff up. Your compiler can't make things up. An agent iterates LLM calls. When your LLM call makes an API up, your compiler will generate errors. The errors get fed back into the iterative loop. In pretty much ever real case, the LLM corrects, but either way: the result is clear. The code may be wrong, but it shouldn't hallucinate entire APIs.
replies(2): >>44461543 #>>44461587 #
3. beckthompson ◴[] No.44461543[source]
But just compiling doesn't mean that much and doesn't really solve the core issue of AIs making stuff up. I could hook up a random word generator into a compiler and it also would also pass that test!

For example, just yesterday I asked an AI a question about how to approach a specific problem. It gave an answer that "worked" (it compiled!) but in reality it didn't really make any sense and would add a very nasty bug. What it wrote (It used a FrameUpdate instead of a normal Update) just didn't make sense on a basic level of how the framework worked.

replies(1): >>44461564 #
4. tptacek ◴[] No.44461564{3}[source]
I'm not interested in this Calvinball argument. The post we're commenting on makes a clear claim: an LLM hallucinating entire APIs. Not surreptitiously sneaking subtly shitty stuff past a compiler.

This is my problem: not that people are cynical about LLM-assisted coding, but that they themselves are hallucinating arguments about it, expecting their readers to nod along. Not happening here.

replies(1): >>44461916 #
5. csomar ◴[] No.44461572[source]
You can improve on that

1. A type-strict compiler.

2. https://github.com/isaacphi/mcp-language-server

LLMs will always make stuff up because they are lossy. In the same way that if I ask you to list the methods for some random object lib you'd not be able to do that; you use the documentation to pull that up or your code-complete companion. LLMs are just getting the tools for that.

replies(1): >>44461611 #
6. loire280 ◴[] No.44461587[source]
A great solution to this problem, but it doesn't seem like this approach will generalize to problems in other fields, or even to more suble coding confabulations that can't be detected by the compiler or static analysis.
replies(1): >>44461608 #
7. tptacek ◴[] No.44461608{3}[source]
I vehemently agree with this. But it doesn't change the falsity of the claim in the article.
8. beckthompson ◴[] No.44461611[source]
Oh for sure I agree 100%! I was just saying that they will always make stuff up no matter what. Those are both good fixes but at its core it can only "make stuff up".
9. kgwgk ◴[] No.44461916{4}[source]
> The post we're commenting on makes a clear claim: an LLM hallucinating entire APIs

You made a similar claim: LLMs invent APIs that don't exist

https://news.ycombinator.com/item?id=44461381

replies(1): >>44461932 #
10. tptacek ◴[] No.44461932{5}[source]
The AES block cipher core: also grievously insecure if used naively, without understanding what a block cipher can and can't do, by itself. Thus also an LLM call.