Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.
Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.
Vibe vs reality, and anyone actually working in the space daily can attest how brittle these systems are.
Maybe this changes in SWE with more automated tests in verifiable simulators, but the real world is far to complex to simulate in its vastness.
"Write a Python script that adds three numbers together".
Is that bar going up? I think it probably is, although not as fast/far as some believe. I also think that "unreliable" can still be "useful".
gpt-4.5-preview-2025-02-27 replied with "Hi!"
I got "hi", as expected. What is the full system prompt + user message you're using?
https://i.imgur.com/Y923KXB.png
> gpt-4.5-preview-2025-02-27
Same "hi": https://i.imgur.com/VxiIrIy.png
Say just 'hi'
while the "without any extra words or explanations" part was for the readers of your comment. Perhaps kubb also made a similar mistake.I used empty system prompt.