(openai.com)

1019 points atgctg | 1 comments | 11 Dec 25 18:04 UTC | HN request time: 0s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

breakingcups ◴[11 Dec 25 18:37 UTC] No.46235173[source]▶

Is it me, or did it still get at least three placements of components (RAM and PCIe slots, plus it's DisplayPort and not HDMI) in the motherboard image[0] completely wrong? Why would they use that as a promotional image?

0: https://images.ctfassets.net/kftzwdyauwt9/6lyujQxhZDnOMruN3f...

replies(10): >>46235244 #>>46235267 #>>46236405 #>>46236591 #>>46237241 #>>46239493 #>>46240735 #>>46241534 #>>46241550 #>>46241781 #

whalesalad ◴[11 Dec 25 20:04 UTC] No.46236405[source]▶

>>46235173 #

to be fair that image has the resolution of a flip phone from 2003

replies(2): >>46237625 #>>46239141 #

malfist ◴[11 Dec 25 21:45 UTC] No.46237625[source]▶

>>46236405 #

If I ask you a question and you don't have enough information to answer, you don't confidently give me an answer, you say you don't know.

I might not know exactly how many USB ports this motherboard has, but I wouldn't select a set of 4 and declare it to be a stacked pair.

replies(1): >>46237813 #

AstroBen ◴[11 Dec 25 22:00 UTC] No.46237813[source]▶

>>46237625 #

No-one should have the expectation LLMs are giving correct answers 100% of the time. It's inherent to the tech for them to be confidently wrong

Code needs to be checked

References need to be checked

Any facts or claims need to be checked

replies(2): >>46238498 #>>46241514 #

malfist ◴[11 Dec 25 22:56 UTC] No.46238498[source]▶

>>46237813 #

According to the benchmarks here they're claiming up to 97% accuracy. That ought to be good enough to trust them right?

Or maybe these benchmarks are all wrong

replies(3): >>46238863 #>>46242378 #>>46242867 #

1. refactor_master ◴[12 Dec 25 09:28 UTC] No.46242378[source]▶

>>46238498 #

Gemini routinely makes up stuff about BigQuery’s workings. “It’s poorly documented”. Well, read the open source code, reason it out.

Makes you wonder what 97% is worth. Would we accept a different service with only 97% availability, and all downtime during lunch break?

↑

GPT-5.2