OpenAI o3 and o4-mini

(openai.com)

555 points maheshrijal | 1 comments | 16 Apr 25 17:01 UTC | HN request time: 0s | source

Show context

M4v3R ◴[16 Apr 25 20:50 UTC] No.43710247[source]▶

Ok, I’m a bit underwhelmed. I’ve asked it a fairly technical question, about a very niche topic (Final Fantasy VII reverse engineering): https://chatgpt.com/share/68001766-92c8-8004-908f-fb185b7549...

With right knowledge and web searches one can answer this question in a matter of minutes at most. The model fumbled around modding forums and other sites and did manage to find some good information but then started to hallucinate some details and used them in the further research. The end result it gave me was incorrect, and the steps it described to get the value were totally fabricated.

What’s even worse in the thinking trace it looks like it is aware it does not have an answer and that the 399 is just an estimate. But in the answer itself it confidently states it found the correct value.

Essentially, it lied to me that it doesn’t really know and provided me with an estimate without telling me.

Now, I’m perfectly aware that this is a very niche topic, but at this point I expect the AI to either find me a good answer or tell me it couldn’t do it. Not to lie me in the face.

Edit: Turns out it’s not just me: https://x.com/transluceai/status/1912552046269771985?s=46

replies(13): >>43710318 #>>43711672 #>>43711775 #>>43711851 #>>43712139 #>>43712425 #>>43713176 #>>43713582 #>>43713694 #>>43714110 #>>43714235 #>>43722041 #>>43727880 #

werdnapk ◴[17 Apr 25 00:02 UTC] No.43711672[source]▶

>>43710247 #

I've used AI with "niche" programming questions and it's always a total let down. I truly don't understand this "vibe coding" movement unless everyone is building todo apps.

replies(7): >>43711916 #>>43712520 #>>43712538 #>>43712869 #>>43712926 #>>43713572 #>>43718231 #

SkyPuncher ◴[17 Apr 25 02:32 UTC] No.43712538[source]▶

>>43711672 #

There's a bit of a skill to it.

Good architecture plans help. Telling it where in an existing code base it can find things to pattern match against is also fantastic.

I'll often end up with a task that looks something like this:

* Implement Foo with a relation to FooBar.

* Foo should have X, Y, Z features

* We have an existing pattern for Fidget in BigFidget. Look at that for implementation

* Make sure you account for A, B, C. Check Widget for something similar.

It works surprisingly well.

replies(2): >>43713450 #>>43713636 #

1. extr ◴[17 Apr 25 05:39 UTC] No.43713450{3}[source]▶

>>43712538 #

Yeah this is a great summary of what I do as well and I find it very effective. I think of hands-off AI coding like you're directing a movie. You have a rough image of what "good" looks like in your head, and you're trying to articulate it with enough detail to all the stagehands and actors such that they can realize the vision. The models can always get there with enough coaching, traditionally the question is if that's worth the trouble versus just doing it yourself.

Increasingly I find that AI at this point is good enough I am rarely stepping in to "do it myself".

↑