The Rise of Whatever

(eev.ee)

628 points cratermoon | 5 comments | 04 Jul 25 04:48 UTC | HN request time: 0s | source

Show context

tptacek ◴[04 Jul 25 05:29 UTC] No.44461381[source]▶

LLM output is crap. It’s just crap. It sucks, and is bad.

Still don't get it. LLM outputs are nondeterministic. LLMs invent APIs that don't exist. That's why you filter those outputs through agent constructions, which actually compile code. The nondeterminism of LLMs don't make your compiler nondeterministic.

All sorts of ways to knock LLM-generated code. Most I disagree with, all colorable. But this article is based on a model of LLM code generation from 6 months ago which is simply no longer true, and you can't gaslight your way back to Q1 2024.

replies(7): >>44461418 #>>44461426 #>>44461474 #>>44461544 #>>44461933 #>>44461994 #>>44463037 #

csomar ◴[04 Jul 25 05:59 UTC] No.44461544[source]▶

>>44461381 #

> LLM outputs are nondeterministic.

LLM outputs are deterministic. There is no intrinsic source of randomness. Users can add randomness (temperature) to the output and modify it.

> But this article is based on a model of LLM code generation from 6 months ago

There hasn't been much change in models from 6 months ago. What happened is that we have better tooling to sift through the randomly generated outputs.

I don't disagree with your message. You are being downvoted because a lot of software developers are butt-hurt by it. It is going to force a change in the labor market for developers. In the same way the author is butt-hurt that they didn't buy Bitcoin in the very early days (as they were aware of it) and missed the boat on that.

replies(2): >>44461557 #>>44461746 #

1. tptacek ◴[04 Jul 25 06:02 UTC] No.44461557[source]▶

>>44461544 #

There hasn't been much change in models from 6 months ago.

I made the same claim in a widely-circulated piece a month or so back, and have come to believe it was wildly false, the dumbest thing I said in that piece.

replies(1): >>44461604 #

2. csomar ◴[04 Jul 25 06:12 UTC] No.44461604[source]▶

>>44461557 (TP) #

I have my own test to measure performance: https://omarabid.com/gpt3-now

So far the only model that showed significant advancement and differentiation was GPT-4.5. I advise to look at the problem and read GPT-4.5 answer. It'll show the difference to other "normal models" (including GPT-3.5) as it shows considerable levels of understanding.

Other normal models are now more chatty and have a bit more data. But they do not show increased intelligence.

replies(1): >>44462105 #

3. Karrot_Kream ◴[04 Jul 25 07:38 UTC] No.44462105[source]▶

>>44461604 #

I was able to have Opus 4 one-shot it. Happy to share a screenshot if that wasn't your experience.

replies(1): >>44463520 #

4. csomar ◴[04 Jul 25 11:22 UTC] No.44463520{3}[source]▶

>>44462105 #

Interested to see your Opus 4 one-shot. I tried it very recently on Opus 4 and it burbled non-sense.

replies(1): >>44478883 #

5. Karrot_Kream ◴[06 Jul 25 08:30 UTC] No.44478883{4}[source]▶

>>44463520 #

Sorry for the delay, I'm out for the weekend I'll hey you it tomorrow!

↑