←back to thread

1479 points sandslash | 1 comments | | HN request time: 0.21s | source
Show context
mentalgear ◴[] No.44316934[source]
Meanwhile, I asked this morning Claude 4 to write a simple EXIF normalizer. After two rounds of prompting it to double-check its code, I still had to point out that it makes no sense to load the entire image for re-orientating if the EXIF orientation is fine in the first place.

Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.

Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.

replies(7): >>44317104 #>>44317116 #>>44317136 #>>44317214 #>>44317305 #>>44317622 #>>44317741 #
ramon156 ◴[] No.44317136[source]
The real question is how long it'll take until they're not brittle
replies(3): >>44317160 #>>44317197 #>>44317483 #
kubb ◴[] No.44317160[source]
Or will they ever be reliable. Your question is already making an assumption.
replies(3): >>44317316 #>>44317424 #>>44317731 #
1. vFunct ◴[] No.44317424[source]
Its perfectly reliable for the things you know it to be, such as operations within its context window size.

Don't ask LLMs to "Write me Microsoft Excel".

Instead, ask it to "Write a directory tree view for the Open File dialog box in Excel".

Break your projects down into the smallest chunks you can for the LLMs. The more specific you are, the more reliable it's going to be.

The rest of this year is going to be companies figuring out how to break down large tasks into smaller tasks for LLM consumption.