Most active commenters

tedsanders(3)

Popular/hot comments

>>46236007 #

←back to thread

GPT-5.2

(openai.com)

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

breakingcups ◴[11 Dec 25 18:37 UTC] No.46235173[source]▶

>>46234788 (OP) #

Is it me, or did it still get at least three placements of components (RAM and PCIe slots, plus it's DisplayPort and not HDMI) in the motherboard image[0] completely wrong? Why would they use that as a promotional image?

0: https://images.ctfassets.net/kftzwdyauwt9/6lyujQxhZDnOMruN3f...

replies(10): >>46235244 #>>46235267 #>>46236405 #>>46236591 #>>46237241 #>>46239493 #>>46240735 #>>46241534 #>>46241550 #>>46241781 #

1. tedsanders ◴[11 Dec 25 18:44 UTC] No.46235267[source]▶

>>46235173 #

Yep, the point we wanted to make here is that GPT-5.2's vision is better, not perfect. Cherrypicking a perfect output would actually mislead readers, and that wasn't our intent.

replies(9): >>46235823 #>>46236007 #>>46236072 #>>46236155 #>>46236158 #>>46236250 #>>46236355 #>>46238538 #>>46241716 #

2. wilg ◴[11 Dec 25 19:18 UTC] No.46235860[source]▶

>>46235823 #

What did Sam Altman say? Or is this more of a vague impression thing?

replies(1): >>46235976 #

3. honeycrispy ◴[11 Dec 25 19:20 UTC] No.46235882[source]▶

>>46235823 #

Not sure what you mean, Altman does that fake-humility thing all the time.

It's a marketing trick; show honesty in areas that don't have much business impact so the public will trust you when you stretch the truth in areas that do (AGI cough).

replies(1): >>46235940 #

4. d--b ◴[11 Dec 25 19:23 UTC] No.46235940{3}[source]▶

>>46235882 #

I'm confident that GP is good faithed though. Maybe I am falling for it. Who knows? It doesn't really matter, I just wanted to be nice to the guy. It takes some balls posting as OpenAi employee here, and I wish we heard from them more often, as I am pretty sure all of them lurk around.

replies(1): >>46236335 #

5. BoppreH ◴[11 Dec 25 19:28 UTC] No.46236007[source]▶

>>46235267 (TP) #

That would be a laudable goal, but I feel like it's contradicted by the text:

> Even on a low-quality image, GPT‑5.2 identifies the main regions and places boxes that roughly match the true locations of each component

I would not consider it to have "identified the main regions" or to have "roughly matched the true locations" when ~1/3 of the boxes have incorrect labels. The remark "even on a low-quality image" is not helping either.

Edit: credit where credit is due, the recently-added disclaimer is nice:

> Both models make clear mistakes, but GPT‑5.2 shows better comprehension of the image.

replies(4): >>46236196 #>>46236246 #>>46236990 #>>46242585 #

6. arscan ◴[11 Dec 25 19:34 UTC] No.46236072[source]▶

>>46235267 (TP) #

I think you may have inadvertently misled readers in a different way. I feel misled after not catching the errors myself, assuming it was broadly correct, and then coming across this observation here. Might be worth mentioning this is better but still inaccurate. Just a bit of feedback, I appreciate you are willing to show non-cherry-picked examples and are engaging with this question here.

Edit: As mentioned by @tedsanders below, the post was edited to include clarifying language such as: “Both models make clear mistakes, but GPT‑5.2 shows better comprehension of the image.”

replies(1): >>46236436 #

7. minimaxir ◴[11 Dec 25 19:34 UTC] No.46236074{4}[source]▶

>>46235976 #

Using ChatGPT to ironically post AI-generated comments is still posting of AI-generated comments.

8. g947o ◴[11 Dec 25 19:41 UTC] No.46236155[source]▶

>>46235267 (TP) #

When I saw that it labeled DP ports as HDMI I immediately decided that I am not going to touch this until it is at least 5x better with 95% accuracy with basic things.

I don't see any advantage in using the tool.

replies(1): >>46236486 #

9. iamdanieljohns ◴[11 Dec 25 19:42 UTC] No.46236158[source]▶

>>46235267 (TP) #

Is Adaptive Reasoning gone from GPT-5.2? It was a big part of the release of 5.1 and Codex-Max. Really felt like the future.

replies(1): >>46236393 #

10. hnuser123456 ◴[11 Dec 25 19:45 UTC] No.46236196[source]▶

>>46236007 #

Yeah, what it's calling RAM slots is the CMOS battery. What it's calling the PCIE slot is the interior side of the DB-9 connector. RAM slots and PCIE slots are not even visible in the image.

replies(1): >>46238203 #

11. ◴[11 Dec 25 19:50 UTC] No.46236246[source]▶

>>46236007 #

12. layer8 ◴[11 Dec 25 19:50 UTC] No.46236250[source]▶

>>46235267 (TP) #

You know what would be great? If it had added some boxes with “might be X or Y, but not sure”.

13. rvnx ◴[11 Dec 25 19:58 UTC] No.46236335{4}[source]▶

>>46235940 #

It's the only reasonable choice you can make. As an employee with stock options you do not want to get trashed on Hackernews because this affects your income directly if you try to conduct a secondary share sale or plan to hold until IPO.

Once the IPO is done, and the lockup period is expired, then a lot of employees are planning to sell their shares. But until that, even if the product is behind competitors there is no way you can admit it without putting your money at risk.

replies(1): >>46237400 #

14. iwontberude ◴[11 Dec 25 20:00 UTC] No.46236355[source]▶

>>46235267 (TP) #

But it’s completely wrong.

15. tedsanders ◴[11 Dec 25 20:03 UTC] No.46236393[source]▶

>>46236158 #

Yes, GPT-5.2 still has adaptive reasoning - we just didn't call it out by name this time. Like 5.1 and codex-max, it should do a better job at answering quickly on easy queries and taking its time on harder queries.

16. tedsanders ◴[11 Dec 25 20:06 UTC] No.46236436[source]▶

>>46236072 #

Thanks for the feedback - I agree our text doesn't make the models' mistakes clear enough. I'll make some small edits now, though it might take a few minutes to appear.

17. jacquesm ◴[11 Dec 25 20:11 UTC] No.46236486[source]▶

>>46236155 #

That's a far more dangerous territory. A machine that is obviously broken will not get used. A machine that is subtly broken will propagate errors because it will have achieved a high enough trust level that it will actually get used.

Think 'Therac-25', it worked in 99.5% of the time. In fact it worked so well that reports of malfunctions were routinely discarded.

replies(1): >>46242353 #

18. furyofantares ◴[11 Dec 25 20:53 UTC] No.46236990[source]▶

>>46236007 #

They also changed "roughly match" to "sometimes match".

replies(1): >>46237477 #

19. Esophagus4 ◴[11 Dec 25 21:28 UTC] No.46237400{5}[source]▶

>>46236335 #

I know HN commenters like to see themselves as contrarians, as do I sometimes, but man… this seems like a serious stretch to assume such malicious intent that an employee of the world’s top AI name would astroturf a random HN thread about a picture on a blog.

I’m fairly comfortable taking this OpenAI employee’s comment at face value.

Frankly, I don’t think a HN thread will make a difference to his financial situation, anyway…

replies(1): >>46238628 #

20. MichaelZuo ◴[11 Dec 25 21:34 UTC] No.46237477{3}[source]▶

>>46236990 #

Did they really change a meaningful word like that after publication without an edit note…?

replies(2): >>46237734 #>>46237877 #

21. piker ◴[11 Dec 25 21:54 UTC] No.46237734{4}[source]▶

>>46237477 #

Eh, I'm no shill but their marketing copy isn't exactly the New York Times. They're given some license to respond to critical feedback in a manner that makes the statements more accurate without the same expectations of being objective journalism of record.

replies(1): >>46241558 #

22. dwohnitmok ◴[11 Dec 25 22:04 UTC] No.46237877{4}[source]▶

>>46237477 #

This has definitely happened before with e.g. the o1 release. I will sometimes use the Wayback Machine to verify changes that have been made.

23. hexaga ◴[11 Dec 25 22:30 UTC] No.46238203{3}[source]▶

>>46236196 #

It just overlaid a typical ATX pattern across the motherboard-like parts of the image, even if that's not really what the image is showing. I don't think it's worthwhile to consider this a 'local recognition failure', as if it just happened to mistake CMOS for RAM slots.

Imagine it as a markdown response:

# Why this is an ATX layout motherboard (Honest assessment, straight to the point, *NO* hallucinations)

1. *RAM* as you can clearly see, the RAM slots are to the right of the CPU, so it's obviously ATX

2. *PCIE* the clearly visible PCIE slots are right there at the bottom of the image, so this definitely cannot be anything except an ATX motherboard

3. ... etc more stuff that is supported only by force of preconception

It's just meta signaling gone off the rails. Something in their post-training pipeline is obviously vulnerable given how absolutely saturated with it their model outputs are.

Troubling that the behavior generalizes to image labeling, but not particularly surprising. This has been a visible problem at least since o1, and the lack of change tells me they do not have a real solution.

24. johnwheeler ◴[11 Dec 25 23:00 UTC] No.46238538[source]▶

>>46235267 (TP) #

Oh and you guys don't mislead people ever. Your management is just completely trustworthy, and I'm sure all you guys are too. Give me a break, man. If I were you, I would jump ship or you're going to be like a Theranos employee on LinkedIn.

replies(1): >>46241994 #

25. rvnx ◴[11 Dec 25 23:09 UTC] No.46238628{6}[source]▶

>>46237400 #

Malicious ? No, and this is far from astroturfing, he even speaks as "we". It's just a logical move to defend your company when people claim your product is buggy.

There is no other logical move, this is what I am saying, contrary to people above say this requires a lot of courage. It's not about courage, it's just normal and logic (and yes Hackernews matters a lot, this place is a very strong source of signal for investors).

Not bad at all, just observing it.

26. mkesper ◴[12 Dec 25 06:58 UTC] No.46241558{5}[source]▶

>>46237734 #

Yes, but they should clearly mark updates. That would be professional.

27. ◴[12 Dec 25 07:27 UTC] No.46241716[source]▶

>>46235267 (TP) #

28. yard2010 ◴[12 Dec 25 08:19 UTC] No.46241994[source]▶

>>46238538 #

Hey no need to personally attack anyone. A bad organization can still consist good people.

29. AdamN ◴[12 Dec 25 09:23 UTC] No.46242353{3}[source]▶

>>46236486 #

There was a low-level Google internal service that worked so well that other teams took a hard dependency on it (against advice). So the internal team added a cron job to drop it every once in a while to get people to trust it less :-)

30. guerrilla ◴[12 Dec 25 10:02 UTC] No.46242585[source]▶

>>46236007 #

Leave it to OpenAI to be dishonest about being dishonest. It seems they're also editing this post without notice as well.

↑