Most active commenters
  • tedsanders(3)

←back to thread

GPT-5.2

(openai.com)
1019 points atgctg | 30 comments | | HN request time: 1.269s | source | bottom
Show context
breakingcups ◴[] No.46235173[source]
Is it me, or did it still get at least three placements of components (RAM and PCIe slots, plus it's DisplayPort and not HDMI) in the motherboard image[0] completely wrong? Why would they use that as a promotional image?

0: https://images.ctfassets.net/kftzwdyauwt9/6lyujQxhZDnOMruN3f...

replies(10): >>46235244 #>>46235267 #>>46236405 #>>46236591 #>>46237241 #>>46239493 #>>46240735 #>>46241534 #>>46241550 #>>46241781 #
1. tedsanders ◴[] No.46235267[source]
Yep, the point we wanted to make here is that GPT-5.2's vision is better, not perfect. Cherrypicking a perfect output would actually mislead readers, and that wasn't our intent.
replies(9): >>46235823 #>>46236007 #>>46236072 #>>46236155 #>>46236158 #>>46236250 #>>46236355 #>>46238538 #>>46241716 #
2. wilg ◴[] No.46235860[source]
What did Sam Altman say? Or is this more of a vague impression thing?
replies(1): >>46235976 #
3. honeycrispy ◴[] No.46235882[source]
Not sure what you mean, Altman does that fake-humility thing all the time.

It's a marketing trick; show honesty in areas that don't have much business impact so the public will trust you when you stretch the truth in areas that do (AGI cough).

replies(1): >>46235940 #
4. d--b ◴[] No.46235940{3}[source]
I'm confident that GP is good faithed though. Maybe I am falling for it. Who knows? It doesn't really matter, I just wanted to be nice to the guy. It takes some balls posting as OpenAi employee here, and I wish we heard from them more often, as I am pretty sure all of them lurk around.
replies(1): >>46236335 #
5. BoppreH ◴[] No.46236007[source]
That would be a laudable goal, but I feel like it's contradicted by the text:

> Even on a low-quality image, GPT‑5.2 identifies the main regions and places boxes that roughly match the true locations of each component

I would not consider it to have "identified the main regions" or to have "roughly matched the true locations" when ~1/3 of the boxes have incorrect labels. The remark "even on a low-quality image" is not helping either.

Edit: credit where credit is due, the recently-added disclaimer is nice:

> Both models make clear mistakes, but GPT‑5.2 shows better comprehension of the image.

replies(4): >>46236196 #>>46236246 #>>46236990 #>>46242585 #
6. arscan ◴[] No.46236072[source]
I think you may have inadvertently misled readers in a different way. I feel misled after not catching the errors myself, assuming it was broadly correct, and then coming across this observation here. Might be worth mentioning this is better but still inaccurate. Just a bit of feedback, I appreciate you are willing to show non-cherry-picked examples and are engaging with this question here.

Edit: As mentioned by @tedsanders below, the post was edited to include clarifying language such as: “Both models make clear mistakes, but GPT‑5.2 shows better comprehension of the image.”

replies(1): >>46236436 #
7. minimaxir ◴[] No.46236074{4}[source]
Using ChatGPT to ironically post AI-generated comments is still posting of AI-generated comments.
8. g947o ◴[] No.46236155[source]
When I saw that it labeled DP ports as HDMI I immediately decided that I am not going to touch this until it is at least 5x better with 95% accuracy with basic things.

I don't see any advantage in using the tool.

replies(1): >>46236486 #
9. iamdanieljohns ◴[] No.46236158[source]
Is Adaptive Reasoning gone from GPT-5.2? It was a big part of the release of 5.1 and Codex-Max. Really felt like the future.
replies(1): >>46236393 #
10. hnuser123456 ◴[] No.46236196[source]
Yeah, what it's calling RAM slots is the CMOS battery. What it's calling the PCIE slot is the interior side of the DB-9 connector. RAM slots and PCIE slots are not even visible in the image.
replies(1): >>46238203 #
11. ◴[] No.46236246[source]
12. layer8 ◴[] No.46236250[source]
You know what would be great? If it had added some boxes with “might be X or Y, but not sure”.
13. rvnx ◴[] No.46236335{4}[source]
It's the only reasonable choice you can make. As an employee with stock options you do not want to get trashed on Hackernews because this affects your income directly if you try to conduct a secondary share sale or plan to hold until IPO.

Once the IPO is done, and the lockup period is expired, then a lot of employees are planning to sell their shares. But until that, even if the product is behind competitors there is no way you can admit it without putting your money at risk.

replies(1): >>46237400 #
14. iwontberude ◴[] No.46236355[source]
But it’s completely wrong.
15. tedsanders ◴[] No.46236393[source]
Yes, GPT-5.2 still has adaptive reasoning - we just didn't call it out by name this time. Like 5.1 and codex-max, it should do a better job at answering quickly on easy queries and taking its time on harder queries.
16. tedsanders ◴[] No.46236436[source]
Thanks for the feedback - I agree our text doesn't make the models' mistakes clear enough. I'll make some small edits now, though it might take a few minutes to appear.
17. jacquesm ◴[] No.46236486[source]
That's a far more dangerous territory. A machine that is obviously broken will not get used. A machine that is subtly broken will propagate errors because it will have achieved a high enough trust level that it will actually get used.

Think 'Therac-25', it worked in 99.5% of the time. In fact it worked so well that reports of malfunctions were routinely discarded.

replies(1): >>46242353 #
18. furyofantares ◴[] No.46236990[source]
They also changed "roughly match" to "sometimes match".
replies(1): >>46237477 #
19. Esophagus4 ◴[] No.46237400{5}[source]
I know HN commenters like to see themselves as contrarians, as do I sometimes, but man… this seems like a serious stretch to assume such malicious intent that an employee of the world’s top AI name would astroturf a random HN thread about a picture on a blog.

I’m fairly comfortable taking this OpenAI employee’s comment at face value.

Frankly, I don’t think a HN thread will make a difference to his financial situation, anyway…

replies(1): >>46238628 #
20. MichaelZuo ◴[] No.46237477{3}[source]
Did they really change a meaningful word like that after publication without an edit note…?
replies(2): >>46237734 #>>46237877 #
21. piker ◴[] No.46237734{4}[source]
Eh, I'm no shill but their marketing copy isn't exactly the New York Times. They're given some license to respond to critical feedback in a manner that makes the statements more accurate without the same expectations of being objective journalism of record.
replies(1): >>46241558 #
22. dwohnitmok ◴[] No.46237877{4}[source]
This has definitely happened before with e.g. the o1 release. I will sometimes use the Wayback Machine to verify changes that have been made.
23. hexaga ◴[] No.46238203{3}[source]
It just overlaid a typical ATX pattern across the motherboard-like parts of the image, even if that's not really what the image is showing. I don't think it's worthwhile to consider this a 'local recognition failure', as if it just happened to mistake CMOS for RAM slots.

Imagine it as a markdown response:

# Why this is an ATX layout motherboard (Honest assessment, straight to the point, *NO* hallucinations)

1. *RAM* as you can clearly see, the RAM slots are to the right of the CPU, so it's obviously ATX

2. *PCIE* the clearly visible PCIE slots are right there at the bottom of the image, so this definitely cannot be anything except an ATX motherboard

3. ... etc more stuff that is supported only by force of preconception

--

It's just meta signaling gone off the rails. Something in their post-training pipeline is obviously vulnerable given how absolutely saturated with it their model outputs are.

Troubling that the behavior generalizes to image labeling, but not particularly surprising. This has been a visible problem at least since o1, and the lack of change tells me they do not have a real solution.

24. johnwheeler ◴[] No.46238538[source]
Oh and you guys don't mislead people ever. Your management is just completely trustworthy, and I'm sure all you guys are too. Give me a break, man. If I were you, I would jump ship or you're going to be like a Theranos employee on LinkedIn.
replies(1): >>46241994 #
25. rvnx ◴[] No.46238628{6}[source]
Malicious ? No, and this is far from astroturfing, he even speaks as "we". It's just a logical move to defend your company when people claim your product is buggy.

There is no other logical move, this is what I am saying, contrary to people above say this requires a lot of courage. It's not about courage, it's just normal and logic (and yes Hackernews matters a lot, this place is a very strong source of signal for investors).

Not bad at all, just observing it.

26. mkesper ◴[] No.46241558{5}[source]
Yes, but they should clearly mark updates. That would be professional.
27. ◴[] No.46241716[source]
28. yard2010 ◴[] No.46241994[source]
Hey no need to personally attack anyone. A bad organization can still consist good people.
29. AdamN ◴[] No.46242353{3}[source]
There was a low-level Google internal service that worked so well that other teams took a hard dependency on it (against advice). So the internal team added a cron job to drop it every once in a while to get people to trust it less :-)
30. guerrilla ◴[] No.46242585[source]
Leave it to OpenAI to be dishonest about being dishonest. It seems they're also editing this post without notice as well.