GPT-5.2

(openai.com)

1053 points atgctg | 4 comments | 11 Dec 25 18:04 UTC | HN request time: 0.972s | source

https://platform.openai.com/docs/guides/latest-model

System card: https://cdn.openai.com/pdf/3a4153c8-c748-4b71-8e31-aecbde944...

Show context

svara ◴[12 Dec 25 08:08 UTC] No.46241936[source]▶

In my experience, the best models are already nearly as good as you can be for a large fraction of what I personally use them for, which is basically as a more efficient search engine.

The thing that would now make the biggest difference isn't "more intelligence", whatever that might mean, but better grounding.

It's still a big issue that the models will make up plausible sounding but wrong or misleading explanations for things, and verifying their claims ends up taking time. And if it's a topic you don't care about enough, you might just end up misinformed.

I think Google/Gemini realize this, since their "verify" feature is designed to address exactly this. Unfortunately it hasn't worked very well for me so far.

But to me it's very clear that the product that gets this right will be the one I use.

replies(12): >>46241987 #>>46242107 #>>46242173 #>>46242280 #>>46242317 #>>46242483 #>>46242537 #>>46242589 #>>46243494 #>>46243567 #>>46243680 #>>46244002 #

phorkyas82 ◴[12 Dec 25 08:18 UTC] No.46241987[source]▶

>>46241936 #

Isn't that what no LLM can provide: being free of hallucinations?

replies(5): >>46242091 #>>46242093 #>>46242230 #>>46243681 #>>46244023 #

svara ◴[12 Dec 25 08:37 UTC] No.46242091[source]▶

>>46241987 #

Yes, they'll probably not go away, but it's got to be possible to handle them better.

Gemini (the app) has a "mitigation" feature where it tries to to Google searches to support its statements. That doesn't currently work properly in my experience.

It also seems to be doing something where it adds references to statements (With a separate model? With a second pass over the output? Not sure how that works.). That works well where it adds them, but it often doesn't do it.

replies(2): >>46242582 #>>46242634 #

1. intended ◴[12 Dec 25 10:11 UTC] No.46242634[source]▶

>>46242091 #

Doubt it. I suspect it’s fundamentally not possible in the spirit you intend it.

Reality is perfectly fine with deception and inaccuracy. For language to magically be self constraining enough to only make verified statements is… impossible.

replies(1): >>46242803 #

2. svara ◴[12 Dec 25 10:39 UTC] No.46242803[source]▶

>>46242634 (TP) #

Take a look at the new experimental AI mode in Google scholar, it's going in the right direction.

It might be true that a fundamental solution to this issue is not possible without a major breakthrough, but I'm sure you can get pretty far with better tooling that surfaces relevant sources, and that would make a huge difference.

replies(1): >>46243115 #

3. intended ◴[12 Dec 25 11:30 UTC] No.46243115[source]▶

>>46242803 #

So lets run it through the rubric test -

What’s your level of expertise in this domain or subject? How did you use it? What were your results?

It’s basically gauging expertise vs usage to pin down the variance that seems endemic to LLM utility anecdotes/examples. For code examples I also ask which language was used, the submitters familiarity with the language, their seniority/experience and familiarity with the domain.

replies(1): >>46243410 #

4. svara ◴[12 Dec 25 12:13 UTC] No.46243410{3}[source]▶

>>46243115 #

A lot of words to call me stupid ;) You seem to have put me in some convenient mental box of yours, I don't know which one.

↑