←back to thread

628 points cratermoon | 5 comments | | HN request time: 0s | source
Show context
tptacek ◴[] No.44461381[source]
LLM output is crap. It’s just crap. It sucks, and is bad.

Still don't get it. LLM outputs are nondeterministic. LLMs invent APIs that don't exist. That's why you filter those outputs through agent constructions, which actually compile code. The nondeterminism of LLMs don't make your compiler nondeterministic.

All sorts of ways to knock LLM-generated code. Most I disagree with, all colorable. But this article is based on a model of LLM code generation from 6 months ago which is simply no longer true, and you can't gaslight your way back to Q1 2024.

replies(7): >>44461418 #>>44461426 #>>44461474 #>>44461544 #>>44461933 #>>44461994 #>>44463037 #
csomar ◴[] No.44461544[source]
> LLM outputs are nondeterministic.

LLM outputs are deterministic. There is no intrinsic source of randomness. Users can add randomness (temperature) to the output and modify it.

> But this article is based on a model of LLM code generation from 6 months ago

There hasn't been much change in models from 6 months ago. What happened is that we have better tooling to sift through the randomly generated outputs.

I don't disagree with your message. You are being downvoted because a lot of software developers are butt-hurt by it. It is going to force a change in the labor market for developers. In the same way the author is butt-hurt that they didn't buy Bitcoin in the very early days (as they were aware of it) and missed the boat on that.

replies(2): >>44461557 #>>44461746 #
1. tptacek ◴[] No.44461557[source]
There hasn't been much change in models from 6 months ago.

I made the same claim in a widely-circulated piece a month or so back, and have come to believe it was wildly false, the dumbest thing I said in that piece.

replies(1): >>44461604 #
2. csomar ◴[] No.44461604[source]
I have my own test to measure performance: https://omarabid.com/gpt3-now

So far the only model that showed significant advancement and differentiation was GPT-4.5. I advise to look at the problem and read GPT-4.5 answer. It'll show the difference to other "normal models" (including GPT-3.5) as it shows considerable levels of understanding.

Other normal models are now more chatty and have a bit more data. But they do not show increased intelligence.

replies(1): >>44462105 #
3. Karrot_Kream ◴[] No.44462105[source]
I was able to have Opus 4 one-shot it. Happy to share a screenshot if that wasn't your experience.
replies(1): >>44463520 #
4. csomar ◴[] No.44463520{3}[source]
Interested to see your Opus 4 one-shot. I tried it very recently on Opus 4 and it burbled non-sense.
replies(1): >>44478883 #
5. Karrot_Kream ◴[] No.44478883{4}[source]
Sorry for the delay, I'm out for the weekend I'll hey you it tomorrow!