Most active commenters
  • nowittyusername(3)

←back to thread

196 points yuedongze | 32 comments | | HN request time: 0.622s | source | bottom
1. gradus_ad ◴[] No.46195373[source]
The proliferation of nondeterministically generated code is here to stay. Part of our response must be more dynamic, more comprehensive and more realistic workload simulation and testing frameworks.
replies(5): >>46195431 #>>46195733 #>>46197437 #>>46197956 #>>46199307 #
2. yuedongze ◴[] No.46195431[source]
i've seen a lot of startups that use AI to QA human work. how about the idea of use humans to QA AI work? a lot of interesting things might follow
replies(6): >>46195474 #>>46195546 #>>46195718 #>>46195741 #>>46195828 #>>46199496 #
3. Aldipower ◴[] No.46195474[source]
Sounds inhuman.
replies(2): >>46195538 #>>46195561 #
4. A4ET8a8uTh0_v2 ◴[] No.46195538{3}[source]
Nah, sounds like management, but I am repeating myself. In all seriousness, I have found myself having to carefully rein some of similar decisions in. I don't want to get into details, but there are times I wonder if they understand how things really work or if people need some 'floor' level exposure before they just decree stuff.
5. __loam ◴[] No.46195546[source]
No thanks.
6. quantummagic ◴[] No.46195561{3}[source]
As an industry, we've been doing the same thing to people in almost every other sector of the workforce, since we began. Automation is just starting to come for us now, and a lot of us are really pissed off about it. All of a sudden, we're humanitarians.
replies(1): >>46196483 #
7. adventured ◴[] No.46195718[source]
A large percentage (at least 50%) of the market for software developers will shift to lower paid jobs focused on managing, inspecting and testing the work that AI does. If a median software developer job paid $125k before, it'll shift to $65k-$85k type AI babysitting work after.
replies(1): >>46196176 #
8. OptionOfT ◴[] No.46195733[source]
I disagree. I think we're testing it, and we haven't seen the worst of it yet.

And I think it's less about non-deterministic code (the code is actually still deterministic) but more about this new-fangled tool out there that finally allows non-coders to generate something that looks like it works. And in many cases it does.

Like a movie set. Viewed from the right angle it looks just right. Peek behind the curtain and it's all wood, thinly painted, and it's usually easier to rebuild from scratch than to add a layer on top.

replies(2): >>46197950 #>>46199508 #
9. ◴[] No.46195741[source]
10. colechristensen ◴[] No.46195828[source]
Yes, but not like what you think. Programmers are going to look more like product managers with extra technical context.

AI is also great at looking for its own quality problems.

Yesterday on an entirely LLM generated codebase

Prompt: > SEARCH FOR ANTIPATTERNS

Found 17 antipatterns across the codebase:

And then what followed was a detailed list, about a third of them I thought were pretty important, a third of them were arguably issues or not, and the rest were either not important or effectively "this project isn't fully functional"

As an engineer, I didn't have to find code errors or fix code errors, I had to pick which errors were important and then give instructions to have them fixed.

replies(2): >>46196151 #>>46197630 #
11. mjr00 ◴[] No.46196151{3}[source]
> Programmers are going to look more like product managers with extra technical context.

The limit of product manager as "extra technical context" approaches infinity is programmer. Because the best, most specific way to specify extra technical context is just plain old code.

replies(1): >>46197222 #
12. mjr00 ◴[] No.46196176{3}[source]
It's funny that I heard exactly this when I graduated university in the late 2000s:

> A large percentage (at least 50%) of the market for software developers will shift to lower paid jobs focused on managing, inspecting and testing the work that outsourced developers do. If a median software developer job paid $125k before, it'll shift to $65k-$85k type outsourced developer babysitting work after.

13. Terr_ ◴[] No.46196483{4}[source]
> Automation is just starting to come for us now

This argument is common and facile: Software development has always been about "automating ourselves out of a job", whether in the broad sense of creating compilers and IDEs, or in the individual sense that you write some code and say: "Hey, I don't want to rewrite this again later, not even if I was being paid for my time, I'll make it into a reusable library."

> the same thing

The reverse: What pisses me off is how what's coming is not the same thing.

Customers are being sold a snake-oil product, and its adoption may well ruin things we've spent careers de-crappifying by making them consistent and repeatable and understandable. In the aftermath, some portion of my (continued) career will be diverted to cleaning up the lingering damage from it.

14. LPisGood ◴[] No.46197222{4}[source]
This is exactly why no code / low code solutions don’t really work. At the end of the day, there is irreducible technical complexity.
15. wasmainiac ◴[] No.46197437[source]
Code has always been nondetermistic. Which engineer wrote it? What was their past experience? This just feels like we are accepting subpar quality because we have no good way to ensure the code we generate is reasonable that wont mayyyybe rm-rf our server as a fun easter egg.
replies(1): >>46198281 #
16. manmal ◴[] No.46197630{3}[source]
Yeah, don‘t rely on the LLM finding all the issues. Complex code like Swift concurrency tooling is just riddled with issues. I usually need to increase to 100% line coverage and then let it loop on hanging tests until everything _seems_ to work.

(It’s been said that Swift concurrency is too hard for humans as well though)

replies(1): >>46199237 #
17. Angostura ◴[] No.46197950[source]
I just wanted to say how much I like that similie - I'm going to knick it for sure
18. glitchc ◴[] No.46197956[source]
Agreed. It's a new programming paradigm that will put more pressure on API and framework design, to protect vibe developers from themselves.
19. mort96 ◴[] No.46198281[source]
Code written by humans has always been nondeterministic, but generated code has always been deterministic before now. Dealing with nondeterministically generated code is new.
replies(2): >>46199734 #>>46221999 #
20. colechristensen ◴[] No.46199237{4}[source]
I don't trust programmers to find all the issues either and in several shops I've been in "we should have tests" was a controversial argument.

A good software engineering system built around the top LLMs today is definitely competitive in quality to a mediocre software shop and 100x faster and 1000x cheaper.

21. energy123 ◴[] No.46199307[source]
Nondeterministic isn't the right word because LLM outputs are deterministic and the tokens created from those outputs can also be deterministic.
replies(1): >>46199468 #
22. Yoric ◴[] No.46199468[source]
I agree that non-deterministic isn't the right word, because that's not the property we care about, but unless I'm strongly missing something LLM outputs are very much non-deterministic, both during the inference itself and when projecting the embeddings back into tokens.
replies(1): >>46199738 #
23. hn_acc1 ◴[] No.46199496[source]
This feels a lot like the "humans must be ready at any time to take over from FSD" that Tesla is trying to push. With presumably similar results.

If it works 85% of the time, how soon do you catch that it is moving in the wrong direction? Are you having a standup every few minutes for it to review (edit) it's work with you? Are you reviewing hundreds of thousands of lines of code every day?

It feels a bit like pouring cement or molten steel really fast: at best, it works, and you get things done way faster. Get it just a bit wrong, and your work is all messed up, as well as a lot of collateral damage. But I guess if you haven't shipped yet, it's ok to start over? How many different respins can you keep in your head before it all blends?

24. Yoric ◴[] No.46199508[source]
Exactly that.

I suspect that we're going to witness a (further) fork within developers. Let's call them the PM-style developers on one side and the system-style developers on the other.

The PM-style developers will be using popular loosely/dynamically-typed languages because they're easy to generate and they'll give you prototypes quickly.

The system-style developers will be using stricter languages and type systems and/or lots of TDD because this will make it easier to catch the generated code's blind spots.

One can imagine that these will be two clearly distinct professions with distinct toolsets.

replies(1): >>46199761 #
25. nowittyusername ◴[] No.46199734{3}[source]
determinism v nondeterminism is and has never been an issue. also all llms are 100% deterministic, what is non deterministic are the sampling parameters used by the inference engine. which by the way can be easily made 100% deterministic by simply turning off things like batching. this is a matter for cloud based api providers as you as the end user doesnt have acess to the inferance engine, if you run any of your models locally in llama.cpp turning off some server startup flags will get you the deterministic results. cloud based api providers have no choice but keeping batching on as they are serving millions of users and wasting precious vram slots on a single user is wasteful and stupid. see my code and video as evidence if you want to run any local llm 100% deterministocally https://youtu.be/EyE5BrUut2o?t=1
replies(1): >>46201156 #
26. energy123 ◴[] No.46199738{3}[source]
I agree it isn't the main property we care about, we care about reliability.

But at least in its theoretical construction the LLM should be deterministic. It outputs a fixed probability distribution across tokens with no rng involvement.

We then sample from that fixed distribution non-deterministically for better performance or we use greedy decoding and get slightly worse performance in exchange for full determinism.

Happy to be corrected if I am wrong about something.

27. OptionOfT ◴[] No.46199761{3}[source]
I actually think that the direct usage of AI will reduce in the system-style group (if it was ever large there).

There is a non-trivial cost in taking apart the AI code to ensure it's correct, even with tests. And I think it's easy to become slower than writing it from scratch.

28. nazgul17 ◴[] No.46201156{4}[source]
That's not an interesting difference, from my point of view. The box m black box we all use is non deterministic, period. Doesn't matter where on the inside the system stops being deterministic: if I hit the black box twice, I get two different replies. And that doesn't even matter, which you also said.

The more important property is that, unlike compilers, type checkers, linters, verifiers and tests, the output is unreliable. It comes with no guarantees.

One could be pedantic and argue that bugs affect all of the above. Or that cosmic rays make everything unreliable. Or that people are non deterministic. All true, but the rate of failure, measured in orders of magnitude, is vastly different.

replies(1): >>46201303 #
29. nowittyusername ◴[] No.46201303{5}[source]
My man did you even check my video, did you even try the app. This is not "bug related" nowhere did i say it was a bug. Batch processing is a FEATURE that is intentionally turned on in the inference engine for large scale providers. That does not mean it has to be on. If they turn off batch processing al llm api calls will be 100% deterministic but it will cost them more money to provide the services as now you are stuck with providing 1 api call per GPU. "if I hit the black box twice, I get two different replies" what you are saying here is 100% verifiably wrong. Just because someone chose to turn on a feature in the inference engine to save money does not mean llms are anon deterministic. LLM's are stateless. their weights are froze, you never "run" an LLM, you can only sample it. just like a hologram. and depending on the inference sampling settings you use is what determines the outcome.....
replies(1): >>46202407 #
30. pegasus ◴[] No.46202407{6}[source]
Correct me if I'm wrong, but even with batch processing turned off, they are still only deterministic as long as you set the temperature to zero? Which also has the side-effect of decreasing creativity. But maybe there's a way to pass in a seed for the pseudo-random generator and restore determinism in this case as well. Determinism, in the sense of reproducible. But even if so, "determinism" means more than just mechanical reproducibility for most people - including parent, if you read their comment carefully. What they mean is: in some important way predictable for us humans. I.e. no completely WTF surprises, as LLMs are prone to produce once in a while, regardless of batch processing and temperature settings.
replies(1): >>46204834 #
31. nowittyusername ◴[] No.46204834{7}[source]
You can change ANY sampling parameter once batch processing is off and you will keep the deterministic behavior. temperature, repetition penalty, etc.... I got to say I'm a bit disappointed in seeing this in hacker news, as I expect this from reddit. you bring the whole matter on a silver platter, the video describes in detail how any sampling parameter can be used, i provide the whole code opensource so anyone can try it themselves without taking my claims as hearsay, well you can bring a horse to water as they say....
32. wasmainiac ◴[] No.46221999{3}[source]
> generated code has always been deterministic

Technically you are right… but in principle no. Ask an LLM any reasonably complex task and you will get different results. This is because the mode changes periodically and we have no control over the host systems source of entropy. It’s effectively non deterministic.