←back to thread

555 points maheshrijal | 2 comments | | HN request time: 0.001s | source
Show context
M4v3R ◴[] No.43710247[source]
Ok, I’m a bit underwhelmed. I’ve asked it a fairly technical question, about a very niche topic (Final Fantasy VII reverse engineering): https://chatgpt.com/share/68001766-92c8-8004-908f-fb185b7549...

With right knowledge and web searches one can answer this question in a matter of minutes at most. The model fumbled around modding forums and other sites and did manage to find some good information but then started to hallucinate some details and used them in the further research. The end result it gave me was incorrect, and the steps it described to get the value were totally fabricated.

What’s even worse in the thinking trace it looks like it is aware it does not have an answer and that the 399 is just an estimate. But in the answer itself it confidently states it found the correct value.

Essentially, it lied to me that it doesn’t really know and provided me with an estimate without telling me.

Now, I’m perfectly aware that this is a very niche topic, but at this point I expect the AI to either find me a good answer or tell me it couldn’t do it. Not to lie me in the face.

Edit: Turns out it’s not just me: https://x.com/transluceai/status/1912552046269771985?s=46

replies(13): >>43710318 #>>43711672 #>>43711775 #>>43711851 #>>43712139 #>>43712425 #>>43713176 #>>43713582 #>>43713694 #>>43714110 #>>43714235 #>>43722041 #>>43727880 #
1. int_19h ◴[] No.43711775[source]
Compare to Gemini Pro 2.5:

https://g.co/gemini/share/c8fb1c9795e4

Of note, the final step in the CoT is:

> Formulate Conclusion: Since a definitive list or count isn't readily available through standard web searches, the best approach is to: state that an exact count is difficult to ascertain from readily available online sources without direct analysis of game files ... avoid giving a specific number, as none was reliably found across multiple sources.

and then the response is in line with that.

replies(1): >>43713369 #
2. M4v3R ◴[] No.43713369[source]
I like this answer. It does mention the correct, definitive way of getting the information I want (extracting the char.lgp data file) and so even though it gave up it pushes you in the right direction, whereas o3/o4 just make up stuff.