0: https://images.ctfassets.net/kftzwdyauwt9/6lyujQxhZDnOMruN3f...
0: https://images.ctfassets.net/kftzwdyauwt9/6lyujQxhZDnOMruN3f...
It's a marketing trick; show honesty in areas that don't have much business impact so the public will trust you when you stretch the truth in areas that do (AGI cough).
> Even on a low-quality image, GPT‑5.2 identifies the main regions and places boxes that roughly match the true locations of each component
I would not consider it to have "identified the main regions" or to have "roughly matched the true locations" when ~1/3 of the boxes have incorrect labels. The remark "even on a low-quality image" is not helping either.
Edit: credit where credit is due, the recently-added disclaimer is nice:
> Both models make clear mistakes, but GPT‑5.2 shows better comprehension of the image.
Edit: As mentioned by @tedsanders below, the post was edited to include clarifying language such as: “Both models make clear mistakes, but GPT‑5.2 shows better comprehension of the image.”
Once the IPO is done, and the lockup period is expired, then a lot of employees are planning to sell their shares. But until that, even if the product is behind competitors there is no way you can admit it without putting your money at risk.
Think 'Therac-25', it worked in 99.5% of the time. In fact it worked so well that reports of malfunctions were routinely discarded.
You can find it right next to the image you are talking about.
LLMs have always been very subhuman at vision, and GPT-5.2 continues in this tradition, but it's still a big step up over GPT-5.1.
One way to get a sense of how bad LLMs are at vision is to watch them play Pokemon. E.g.,: https://www.lesswrong.com/posts/u6Lacc7wx4yYkBQ3r/insights-i...
They still very much struggle with basic vision tasks that adults, kids, and even animals can ace with little trouble.
I’m fairly comfortable taking this OpenAI employee’s comment at face value.
Frankly, I don’t think a HN thread will make a difference to his financial situation, anyway…
I might not know exactly how many USB ports this motherboard has, but I wouldn't select a set of 4 and declare it to be a stacked pair.
Code needs to be checked
References need to be checked
Any facts or claims need to be checked
Imagine it as a markdown response:
# Why this is an ATX layout motherboard (Honest assessment, straight to the point, *NO* hallucinations)
1. *RAM* as you can clearly see, the RAM slots are to the right of the CPU, so it's obviously ATX
2. *PCIE* the clearly visible PCIE slots are right there at the bottom of the image, so this definitely cannot be anything except an ATX motherboard
3. ... etc more stuff that is supported only by force of preconception
--
It's just meta signaling gone off the rails. Something in their post-training pipeline is obviously vulnerable given how absolutely saturated with it their model outputs are.
Troubling that the behavior generalizes to image labeling, but not particularly surprising. This has been a visible problem at least since o1, and the lack of change tells me they do not have a real solution.
There is no other logical move, this is what I am saying, contrary to people above say this requires a lot of courage. It's not about courage, it's just normal and logic (and yes Hackernews matters a lot, this place is a very strong source of signal for investors).
Not bad at all, just observing it.
It's not okay if claims are totally made up 1/30 times
Of course people aren't always correct either, but we're able to operate on levels of confidence. We're also able to weight others' statements as more or less likely to be correct based on what we know about them
Makes you wonder what 97% is worth. Would we accept a different service with only 97% availability, and all downtime during lunch break?