Most active commenters

    ←back to thread

    GPT-5.2

    (openai.com)
    1019 points atgctg | 12 comments | | HN request time: 0s | source | bottom
    Show context
    svara ◴[] No.46241936[source]
    In my experience, the best models are already nearly as good as you can be for a large fraction of what I personally use them for, which is basically as a more efficient search engine.

    The thing that would now make the biggest difference isn't "more intelligence", whatever that might mean, but better grounding.

    It's still a big issue that the models will make up plausible sounding but wrong or misleading explanations for things, and verifying their claims ends up taking time. And if it's a topic you don't care about enough, you might just end up misinformed.

    I think Google/Gemini realize this, since their "verify" feature is designed to address exactly this. Unfortunately it hasn't worked very well for me so far.

    But to me it's very clear that the product that gets this right will be the one I use.

    replies(8): >>46241987 #>>46242107 #>>46242173 #>>46242280 #>>46242317 #>>46242483 #>>46242537 #>>46242589 #
    1. stacktrace ◴[] No.46242173[source]
    > It's still a big issue that the models will make up plausible sounding but wrong or misleading explanations for things, and verifying their claims ends up taking time. And if it's a topic you don't care about enough, you might just end up misinformed.

    Exactly! One important thing LLMs have made me realise deeply is "No information" is better than false information. The way LLMs pull out completely incorrect explanations baffles me - I suppose that's expected since in the end it's generating tokens based on its training and it's reasonable it might hallucinate some stuff, but knowing this doesn't ease any of my frustration.

    IMO if LLMs need to focus on anything right now, they should focus on better grounding. Maybe even something like a probability/confidence score, might end up experience so much better for so many users like me.

    replies(4): >>46242430 #>>46242681 #>>46242794 #>>46242816 #
    2. robocat ◴[] No.46242430[source]
    > wrong or misleading explanations

    Exactly the same issue occurs with search.

    Unfortunately not everybody knows to mistrust AI responses, or have the skills to double-check information.

    replies(4): >>46242500 #>>46242653 #>>46242736 #>>46242992 #
    3. darkwater ◴[] No.46242500[source]
    No, it's not the same. Search results send/show you one or more specific pages/websites. And each website has a different trust factor. Yes, plenty of people repeat things they "read on the Internet" as truths, but it's easy to debunk some of them just based on the site reputation. With AI responses, the reputation is shared with the good answers as well, because they do give good answers most of the time, but also hallucinate errors.
    replies(1): >>46242561 #
    4. SebastianSosa1 ◴[] No.46242561{3}[source]
    Community notes on X seems to be one of the highest profile recent experiments trying to address this issue
    5. ◴[] No.46242653[source]
    6. actionfromafar ◴[] No.46242681[source]
    I wonder if the only way to fix this with current LLMs, would be to generate a lot synthetic data for a select number topics you really don't want it "go off the rails" with. That synthetic data would be lots of variations on that "I don't know how to do X with Y".
    7. incrudible ◴[] No.46242736[source]
    If somebody asks a question on Stackoverflow, it is unlikely that a human who does not know the answer will take time out of their day to completely fabricate a plausible sounding answer.
    8. XCSme ◴[] No.46242794[source]
    But most benchmarks are not about that...

    Are there even any "hallucination" public benchmarks?

    replies(1): >>46243002 #
    9. biofox ◴[] No.46242816[source]
    I ask for confidence scores in my custom instructions / prompts, and LLMs do surprisingly well at estimating their own knowledge most of the time.
    replies(1): >>46243213 #
    10. lins1909 ◴[] No.46242992[source]
    What is it about people making up lies to defend LLMs? In what world is it exactly the same as search? They're literally different things, since you get information from multiple sources and can do your own filtering.
    11. andrepd ◴[] No.46243002[source]
    "Benchmarks" for LLMs are a total hoax, since you can train them on the benchmarks themselves.
    12. drclau ◴[] No.46243213[source]
    How do you know the confidence scores are not hallucinated as well?