←back to thread

358 points andrewstetsenko | 1 comments | | HN request time: 0.201s | source
Show context
sysmax ◴[] No.44360302[source]
AI can very efficiently apply common patterns to vast amounts of code, but it has no inherent "idea" of what it's doing.

Here's a fresh example that I stumbled upon just a few hours ago. I needed to refactor some code that first computes the size of a popup, and then separately, the top left corner.

For brevity, one part used an "if", while the other one had a "switch":

    if (orientation == Dock.Left || orientation == Dock.Right)
        size = /* horizontal placement */
    else
        size = /* vertical placement */

    var point = orientation switch
    {
        Dock.Left => ...
        Dock.Right => ...
        Dock.Top => ...
        Dock.Bottom => ...
    };
I wanted the LLM to refactor it to store the position rather than applying it immediately. Turns out, it just could not handle different things (if vs. switch) doing a similar thing. I tried several variations of prompts, but it very strongly leaning to either have two ifs, or two switches, despite rather explicit instructions not to do so.

It sort of makes sense: once the model has "completed" an if, and then encounters the need for a similar thing, it will pick an "if" again, because, well, it is completing the previous tokens.

Harmless here, but in many slightly less trivial examples, it would just steamroll over nuance and produce code that appears good, but fails in weird ways.

That said, splitting tasks into smaller parts devoid of such ambiguities works really well. Way easier to say "store size in m_StateStorage and apply on render" than manually editing 5 different points in the code. Especially with stuff like Cerebras, that can chew through complex code at several kilobytes per second, expanding simple thoughts faster than you could physically type them.

replies(2): >>44360561 #>>44360985 #
gametorch[dead post] ◴[] No.44360561[source]
[flagged]
sysmax ◴[] No.44360830[source]
I am working on a GUI for delegating coding tasks to LLMs, so I routinely experiment with a bunch of models doing all kinds of things. In this case, Claude Sonnet 3.7 handled it just fine, while Llama-3.3-70B just couldn't get it. But that is literally the simplest example that illustrates the problem.

When I tried giving top-notch LLMs harder tasks (scan an abstract syntax tree coming from a parser in a particular way, and generate nodes for particular things) they completely blew it. Didn't even compile, let alone logical errors and missed points. But once I broke down the problem to making lists of relevant parsing contexts, and generating one wrapper class at a time, it saved me a whole ton of work. It took me a day to accomplish what would normally take a week.

Maybe they will figure it out eventually, maybe not. The point is, right now the technology has fundamental limitations, and you are better off knowing how to work around them, rather than blindly trusting the black box.

replies(1): >>44360860 #
gametorch ◴[] No.44360860[source]
Yeah exactly.

I think it's a combination of

1) wrong level of granularity in prompting

2) lack of engineering experience

3) autistic rigidity regarding a single hallucination throwing the whole experience off

4) subconscious anxiety over the threat to their jerbs

5) unnecessary guilt over going against the tide; anything pro AI gets heavily downvoted on Reddit and is, at best, controversial as hell here

I, for one, have shipped like literally a product per day for the last month and it's amazing. Literally 2,000,000+ impressions, paying users, almost 100 sign ups across the various products. I am fucking flying. Hit the front page of Reddit and HN countless times in the last month.

Idk if I break down the prompts better or what. But this is production grade shit and I don't even remember the last time I wrote more than two consecutive lines of code.

replies(2): >>44360937 #>>44364148 #
1. sysmax ◴[] No.44360937[source]
If you are launching one product per day, you are using LLMs to convert unrefined ideas into proof-of-concept prototypes. That works really well, that's the kind of work that nobody should be doing by hand anymore.

Except, not all work is like that. Fast-forward to product version 2.34 where a particular customer needs a change that could break 5000 other customers because of non-trivial dependencies between different parts of the design, and you will be rewriting the entire thing by humans or having it collapse under its own weight.

But out of 100 products launched on the market, only 1 or 2 will ever reach that stage, and having 100 LLM prototypes followed by 2 thoughtful redesigns is way better than seeing 98 human-made products die.