←back to thread

576 points Gricha | 2 comments | | HN request time: 0.434s | source
Show context
xnorswap ◴[] No.46233056[source]
Claude is really good at specific analysis, but really terrible at open-ended problems.

"Hey claude, I get this error message: <X>", and it'll often find the root cause quicker than I could.

"Hey claude, anything I could do to improve Y?", and it'll struggle beyond the basics that a linter might suggest.

It suggested enthusiastically a library for <work domain> and it was all "Recommended" about it, but when I pointed out that the library had been considered and rejected because <issue>, it understood and wrote up why that library suffered from that issue and why it was therefore unsuitable.

There's a significant blind-spot in current LLMs related to blue-sky thinking and creative problem solving. It can do structured problems very well, and it can transform unstructured data very well, but it can't deal with unstructured problems very well.

That may well change, so I don't want to embed that thought too deeply into my own priors, because the LLM space seems to evolve rapidly. I wouldn't want to find myself blind to the progress because I write it off from a class of problems.

But right now, the best way to help an LLM is have a deep understanding of the problem domain yourself, and just leverage it to do the grunt-work that you'd find boring.

replies(23): >>46233156 #>>46233163 #>>46233206 #>>46233362 #>>46233365 #>>46233406 #>>46233506 #>>46233529 #>>46233686 #>>46233981 #>>46234313 #>>46234696 #>>46234916 #>>46235210 #>>46235385 #>>46236239 #>>46236306 #>>46236829 #>>46238500 #>>46238819 #>>46240191 #>>46243246 #>>46243719 #
pdntspa ◴[] No.46233365[source]
That's why you treat it like a junior dev. You do the fun stuff of supervising the product, overseeing design and implementation, breaking up the work, and reviewing the outputs. It does the boring stuff of actually writing the code.

I am phenomenally productive this way, I am happier at my job, and its quality of work is extremely high as long as I occasionally have it stop and self-review it's progress against the style principles articulated in its AGENTS.md file. (As it tends to forget a lot of rules like DRY)

replies(12): >>46233446 #>>46233448 #>>46233642 #>>46233652 #>>46233782 #>>46234010 #>>46234898 #>>46235480 #>>46238997 #>>46241434 #>>46241981 #>>46242791 #
FeteCommuniste ◴[] No.46233448[source]
Maybe I'm weird but I enjoy "actually writing the code."
replies(6): >>46233475 #>>46233559 #>>46233598 #>>46233879 #>>46234180 #>>46236874 #
nyadesu ◴[] No.46233559[source]
In my case, I enjoy writing code too, but it's helpful to have an assistant I can ask to handle small tasks so I can focus on a specific part that requires attention to detail
replies(1): >>46233762 #
FeteCommuniste ◴[] No.46233762[source]
Yeah, I sometimes use AI for questions like "is it possible to do [x] using library [y] and if so, how?" and have received mostly solid answers.
replies(3): >>46233841 #>>46233909 #>>46234102 #
1. georgemcbay ◴[] No.46234102[source]
> Yeah, I sometimes use AI for questions like "is it possible to do [x] using library [y] and if so, how?" and have received mostly solid answers.

In my experience most LLMs are going to answer this with some form of "Absolutely!" and then propose a square-peg-into-a-round-hole way to do it that is likely suboptimal vs using a different library that is far more suited to your problem if you didn't guess the right fit library to begin with.

The sycophancy problem is still very real even when the topic is entirely technical.

Gemini is (in my experience) the least likely to lead you astray in these situations but its still a significant problem even there.

replies(1): >>46236567 #
2. jessoteric ◴[] No.46236567[source]
IME this has been significantly reduced in newer models like 4.5 Opus and to a lesser extent Sonnet, but agree it's still sort of bad- mainly because the question you're posing is bad.

if you ask a human this the answer can also often be "yes [if we torture the library]", because software development is magic and magic is the realm of imagination.

much better prompt: "is this library designed to solve this problem" or "how can we solve this problem? i am considering using this library to do so, is that realistic?"