←back to thread

559 points Gricha | 1 comments | | HN request time: 0s | source
Show context
xnorswap ◴[] No.46233056[source]
Claude is really good at specific analysis, but really terrible at open-ended problems.

"Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.

"Hey claude, anything I could do to improve Y?", and it'll struggle beyond the basics that a linter might suggest.

It suggested enthusiastically a library for <work domain> and it was all "Recommended" about it, but when I pointed out that the library had been considered and rejected because <issue>, it understood and wrote up why that library suffered from that issue and why it was therefore unsuitable.

There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.

That may well change, so I don't want to embed that thought too deeply into my own priors, because the LLM space seems to evolve rapidly. I wouldn't want to find myself blind to the progress because I write it off from a class of problems.

But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.

replies(21): >>46233156 #>>46233163 #>>46233206 #>>46233362 #>>46233365 #>>46233406 #>>46233506 #>>46233529 #>>46233686 #>>46233981 #>>46234313 #>>46234696 #>>46234916 #>>46235210 #>>46235385 #>>46236239 #>>46236306 #>>46236829 #>>46238500 #>>46238819 #>>46240191 #
james_marks ◴[] No.46233206[source]
This is a key part of the AI love/hate flame war.

Very easy to write it off when it spins out on the open-ended problems, without seeing just how effective it can be once you zoom in.

Of course, zooming in that far gives back some of the promised gains.

Edit: typo

replies(2): >>46233357 #>>46233380 #
thewebguyd ◴[] No.46233357[source]
> without seeing just how effective it can be once you zoom in.

The love/hate flame war continues because the LLM companies aren't selling you on this. The hype is all about "this tech will enable non-experts to do things they couldn't do before" not "this tech will help already existing experts with their specific niche," hence the disconnect between the sales hype and reality.

If OpenAI, Anthropic, Google, etc. were all honest and tempered their own hype and misleading marketing, I doubt there would even be a flame war. The marketing hype is "this will replace employees" without the required fine print of "this tool still needs to be operated by an expert in the field and not your average non technical manager."

replies(1): >>46233533 #
hombre_fatal ◴[] No.46233533[source]
The amount of GUIs I've vibe-coded works against your claim.

As we speak, my macOS menubar has an iStat Menus replacement, a Wispr Flow replacement (global hotkey for speech-to-text), and a logs visualizer for the `blocky` dns filtering program -- all of which I built without reading code aside from where I was curious.

It was so vibe-coded that there was no reason to use SwiftUI nor set them up in Xcode -- just AppKit Swift files compiled into macOS apps when I nix rebuild.

The only effort it required was the energy to QA the LLM's progress and tell it where to improve, maybe click and drag a screenshot into claude code chat if I'm feeling excessive.

Where do my 20 years of software dev experience fit into this except beyond imparting my aesthetic preferences?

In fact, insisting that you write code yourself is becoming a liability in an interesting way: you're going to make trade-offs for DX that the LLM doesn't have to make, like when you use Python or Electron when the LLM can bypass those abstractions that only exist for human brains.

replies(2): >>46233670 #>>46233772 #
onethought ◴[] No.46233670[source]
Love that you are disagreeing with parent by saying you built software all on your own, and you only had 20 years software experience.

Isn't that the point they are making?

replies(1): >>46233681 #
hombre_fatal ◴[] No.46233681[source]
Maybe I didn't make it clear, but I didn't build the software in my comment. A clanker did.

Vibe-coding is a claude code <-> QA loop on the end result that anyone can do (the non-experts in his claim).

An example of a cycle looks like "now add an Options tab that let's me customize the global hotkey" where I'm only an end-user.

Once again, where do my 20 years of software experience come up in a process where I don't even read code?

replies(2): >>46233729 #>>46234303 #
thewebguyd ◴[] No.46234303{5}[source]
> An example of a cycle looks like "now add an Options tab that let's me customize the global hotkey" where I'm only an end-user

Which is a prompt that someone with experience would write. Your average, non-technical person isn't going to prompt something like that, they are going to say "make it so I can change the settings" or something else super vague and struggle. We all know how difficult it is to define software requirements.

Just because an LLM wrote the actual code doesn't mean your prompts weren't more effective because of your experience and expertise in building software.

Sit someone down in front of an LLM with zero development or UI experience at all and they will get very different results. Chances are they won't even specify "macOS menu bar app" in the prompt and the LLM will end up trying to make them a webapp.

Your vibe coding experience just proves my initial point, that these tools are useful for those who already have experience and can lean on that to craft effective prompts. Someone non-technical isn't going to make effective use of an LLM to make software.

replies(2): >>46234532 #>>46234962 #
1. ModernMech ◴[] No.46234532{6}[source]
Here's how I look at it as a roboticist:

The LLM prompt space is an ND space where you can start at any point, and then the LLM carves a path through the space for so many tokens using the instructions you provided, until it stops and asks for another direction. This frames LLM prompt coding as a sort of navigation task.

The problem is difficult because at every decision point, there's an infinite number of things you could say that could lead to better or worse results in the future.

Think of a robot going down the sidewalk. It controls itself autonomously, but it stops at every intersection and asks "where to next boss?" You can tell it either to cross the street, or drive directly into traffic, or do any number of other things that could cause it to get closer to its destination, further away, or even to obliterate itself.

In the concrete world, it's easy to direct this robot, and to direct it such that it avoids bad outcomes, and to see that it's achieving good outcomes -- it's physically getting closer to the destination.

But when prompting in an abstract sense, its hard to see where the robot is going unless you're an expert in that abstract field. As an expert, you know the right way to go is across the street. As a novice, you might tell the LLM to just drive into traffic, and it will happily oblige.

The other problem is feedback. When you direct the physical robot to drive into traffic, you witness its demise, its fate is catastrophic, and if you didn't realize it before, you'd see the danger then. The robot also becomes incapacitated, and it can't report falsely about its continued progress.

But in the abstract case, the LLM isn't obliterated, it continues to report on progress that isn't real, and as a non expert, you can't tell its been flattened into a pancake. The whole output chain is now completely and thoroughly off the rails, but you can't see the smoldering ruins of your navigation instructions because it's told you "Exactly, you're absolutely right!"