(openai.com)

1019 points atgctg | 3 comments | 11 Dec 25 18:04 UTC | HN request time: 0.001s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

breakingcups ◴[11 Dec 25 18:37 UTC] No.46235173[source]▶

Is it me, or did it still get at least three placements of components (RAM and PCIe slots, plus it's DisplayPort and not HDMI) in the motherboard image[0] completely wrong? Why would they use that as a promotional image?

0: https://images.ctfassets.net/kftzwdyauwt9/6lyujQxhZDnOMruN3f...

replies(10): >>46235244 #>>46235267 #>>46236405 #>>46236591 #>>46237241 #>>46239493 #>>46240735 #>>46241534 #>>46241550 #>>46241781 #

tedsanders ◴[11 Dec 25 18:44 UTC] No.46235267[source]▶

>>46235173 #

Yep, the point we wanted to make here is that GPT-5.2's vision is better, not perfect. Cherrypicking a perfect output would actually mislead readers, and that wasn't our intent.

replies(9): >>46235823 #>>46236007 #>>46236072 #>>46236155 #>>46236158 #>>46236250 #>>46236355 #>>46238538 #>>46241716 #

1. g947o ◴[11 Dec 25 19:41 UTC] No.46236155[source]▶

>>46235267 #

When I saw that it labeled DP ports as HDMI I immediately decided that I am not going to touch this until it is at least 5x better with 95% accuracy with basic things.

I don't see any advantage in using the tool.

replies(1): >>46236486 #

2. jacquesm ◴[11 Dec 25 20:11 UTC] No.46236486[source]▶

>>46236155 (TP) #

That's a far more dangerous territory. A machine that is obviously broken will not get used. A machine that is subtly broken will propagate errors because it will have achieved a high enough trust level that it will actually get used.

Think 'Therac-25', it worked in 99.5% of the time. In fact it worked so well that reports of malfunctions were routinely discarded.

replies(1): >>46242353 #

3. AdamN ◴[12 Dec 25 09:23 UTC] No.46242353[source]▶

>>46236486 #

There was a low-level Google internal service that worked so well that other teams took a hard dependency on it (against advice). So the internal team added a cron job to drop it every once in a while to get people to trust it less :-)

↑