Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.
Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.
Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.
Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.
Don't ask LLMs to "Write me Microsoft Excel".
Instead, ask it to "Write a directory tree view for the Open File dialog box in Excel".
Break your projects down into the smallest chunks you can for the LLMs. The more specific you are, the more reliable it's going to be.
The rest of this year is going to be companies figuring out how to break down large tasks into smaller tasks for LLM consumption.
"Write a Python script that adds three numbers together".
Is that bar going up? I think it probably is, although not as fast/far as some believe. I also think that "unreliable" can still be "useful".
gpt-4.5-preview-2025-02-27 replied with "Hi!"
I got "hi", as expected. What is the full system prompt + user message you're using?
https://i.imgur.com/Y923KXB.png
> gpt-4.5-preview-2025-02-27
Same "hi": https://i.imgur.com/VxiIrIy.png
Say just 'hi'
while the "without any extra words or explanations" part was for the readers of your comment. Perhaps kubb also made a similar mistake.I used empty system prompt.