Most active commenters
  • menaerus(3)
  • shinycode(3)

←back to thread

423 points sohkamyung | 55 comments | | HN request time: 2.117s | source | bottom
Show context
scarmig ◴[] No.45669929[source]
If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.

This article contains significant issues.

replies(7): >>45669943 #>>45670942 #>>45671401 #>>45672311 #>>45672577 #>>45675250 #>>45679322 #
1. afavour ◴[] No.45669943[source]
> or it (shocking) cites Wikipedia instead of the BBC.

No... the problem is that it cites Wikipedia articles that don't exist.

> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

replies(6): >>45670006 #>>45670093 #>>45670094 #>>45670184 #>>45670903 #>>45672812 #
2. scarmig ◴[] No.45670006[source]
> Participating organizations raised concerns about responses that relied heavily or solely on Wikipedia content – Radio-Canada calculated that of 108 sources cited in responses from ChatGPT, 58% were from Wikipedia. CBC-Radio-Canada are amongst a number of Canadian media organisations suing ChatGPT’s creator, OpenAI, for copyright infringement. Although the impact of this on ChatGPT’s approach to sourcing is not explicitly known, it may explain the high use of Wikipedia sources.

Also, is attributing, without any citation, ChatGPT's preference for Wikipedia to a reprisal to an active lawsuit a significant issue? Or do the authors get off scot-free because they caged it in "we don't know, but maybe it's the case"?

replies(3): >>45670451 #>>45670541 #>>45671199 #
3. kenjackson ◴[] No.45670093[source]
Actually there was a Wikipedia article of this name, but it was deleted in June -- because it was AI generated. Unfortunately AI falls for this much like humans do.

https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

replies(4): >>45670306 #>>45670779 #>>45671331 #>>45672567 #
4. menaerus ◴[] No.45670094[source]
> For the current research, a set of 30 “core” news questions was developed

Right. Let's talk about statistics for a bit. Or let's put it differently: they found in their report that 45% of the answers for 30 questions they have "developed" had a significant issue, e.g. inexisting reference

I'll give you 30 questions out of my sleeve where 95% of the answers will not have any significant issue.

replies(1): >>45670270 #
5. hnuser123456 ◴[] No.45670184[source]
Do we have any good research on how much less often larger, newer models will just make stuff up like this? As it is, it's pretty clear LLMs are categorically not a good idea for directly querying for information in any non-fiction-writing context. If you're using an LLM to research something that needs to be accurate, the LLM needs to be doing a tool call to a web search and only asked to summarize relevant facts from the existing information it can find, and have them be cited by hard-coding the UI to link the pages the LLM reviewed. The LLM itself cannot be trusted to generate its own citations. It will just generate something that looks like a relevant citation, along with whatever imaginary content it wants to attribute to this non-existent source.
replies(4): >>45670469 #>>45670908 #>>45672029 #>>45674716 #
6. matthewmacleod ◴[] No.45670270[source]
Yes, I'm sure you could hack together some bullshit questions to demonstrate whatever you want. Is there a specific reason that the reasonably straightforward methodology they did use is somehow flawed?
replies(1): >>45670445 #
7. Workaccount2 ◴[] No.45670306[source]
This is likely because of the knowledge cutoff.

I have seen a few cases before of "hallucinations" that turned out to be things that did exist, but no longer do.

replies(1): >>45670633 #
8. menaerus ◴[] No.45670445{3}[source]
Yes, and you answered it yourself.
replies(1): >>45670661 #
9. ◴[] No.45670451[source]
10. jacobolus ◴[] No.45670469[source]
A further problem is that Wikipedia is chock full of nonsense, with a large proportion of articles that were never fact checked by an expert, and many that were written to promote various biased points of view, inadvertently uncritically repeat claims from slanted sources, or mischaracterize claims made in good sources. Many if not most articles have poor choice of emphasis of subtopics, omit important basic topics, and make routine factual errors. (This problem is not unique to Wikipedia by any means, and despite its flaws Wikipedia is an amazing achievement.)

A critical human reader can go as deep as they like in examining claims there: can look at the source listed for a claim, can often click through to read the claim in the source, can examine the talk page and article history, can search through the research literature trying to figure out where the claim came from or how it mutated in passing from source to source, etc. But an AI "reader" is a predictive statistical model, not a critical consumer of information.

replies(4): >>45671016 #>>45671654 #>>45671893 #>>45673271 #
11. ffsm8 ◴[] No.45670541[source]
Literally constantly? It takes both careful prompting and throughout double-checking to really notice however. Because often the links also exist, just don't represent what the LLM made it sound like.

And the worst part about the people unironically thinking they can use it for "research" is, that it essentially supercharges confirmation bias.

The inefficient sidequests you do while researching is generally what actually gives you the ability to really reason about a topic.

If you instead just laser focus on the tidbits you prompted with... Well, your opinion is a lot less grounded.

replies(1): >>45672257 #
12. 1980phipsi ◴[] No.45670633{3}[source]
The fix for this is for the AI to double-check all links before providing them to the user. I frequently ask ChatGPT to double check that references actually exist when it gives me them. It should be built in!
replies(4): >>45670762 #>>45670808 #>>45670935 #>>45673056 #
13. darkwater ◴[] No.45670661{4}[source]
Err, no? Being _possible_ does not necessarily imply that's what happened.
replies(1): >>45670934 #
14. rideontime ◴[] No.45670762{4}[source]
But that would mean OpenAI would lose even more money on every query.
replies(2): >>45672453 #>>45674673 #
15. bunderbunder ◴[] No.45670779[source]
The biggest problem with that citation isn't that the article has since been deleted. The biggest problem is that that particular Wikipedia article was never a good source in the first place.

That seems to be the real challenge with AI for this use case. It has no real critical thinking skills, so it's not really competent to choose reliable sources. So instead we're lowering the bar to just asking that the sources actually exist. I really hate that. We shouldn't be lowering intellectual standards to meet AI where it's at. These intellectual standards are important and hard-won, and we need to be demanding that AI be the one to rise to meet them.

replies(2): >>45670872 #>>45671358 #
16. blitzar ◴[] No.45670808{4}[source]
I have found my self doing the same "citation needed" loop - but with ai this is a dangerous game as it will now double down on whatever it made up and go looking for citations to justify its answer.

Pre prompting to cite sources is obviously a better way of going about things.

replies(1): >>45671537 #
17. gamerDude ◴[] No.45670872{3}[source]
I think this is a real challenge for everyone. In many ways potentially we need a restart of a wikipedia like site to document all the valid and good sources. This would also hopefully include things like source bias and whether it's a primary/secondary/tertiary source.
replies(5): >>45671575 #>>45671882 #>>45672162 #>>45673022 #>>45673869 #
18. shinycode ◴[] No.45670903[source]
I used perplexity for searches and I clicked on all sources that were given. Depending on the model used from 100% to 20% of the urls I tested did not exist. I kept on querying the LLM about it and it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists. Useless.
replies(1): >>45670989 #
19. bigbuppo ◴[] No.45670908[source]
The problem is that people are using it as a substitute for a web search, and the web search company has decided to kill off search as a product and pivot to video, err, I mean pivot to AI chatbots so hard they replaced one of the common ways to access emergency services on their mobile phones with an AI chatbot that can't help you in an emergency.

Not to mention, the AI companies have been extremely abusive to the rest of the internet so they are often blocked from accessing various web sites, so it's not like they're going to be able to access legitimate information anyways.

20. menaerus ◴[] No.45670934{5}[source]
A bucket of 30 questions is not a statistically significant sample size which we can use to support the hypothesis which goes to say that all AI assistants they tested are 45% of the time wrong. That's not how science works.

Neither is my bucket of 30 questions statistcally significant but it goes to say that I can disprove their hypothesis just by giving them my sample.

I think that the report is being disingenious and I don't understand for what reasons. it's funny that they say "misrepresent" when that's exactly what they are doing.

replies(2): >>45672359 #>>45678070 #
21. janwl ◴[] No.45670935{4}[source]
I thought people here hated it when LLMs made http requests?
replies(2): >>45671214 #>>45671608 #
22. smrq ◴[] No.45670989[source]
I share your opinion on the results, but why would you trust the LLM explanation for why it does what it does?
replies(1): >>45671491 #
23. senderista ◴[] No.45671016{3}[source]
Just the other day, I clicked through to a Wikipedia reference (a news article) and discovered that the citing sentence grossly misrepresented the source. Probably not accidental since it was about a politically charged subject.
24. terminalshort ◴[] No.45671199[source]
It's a huge issue. No wonder AI hallucinates when it trains on this kind of crap.
25. macintux ◴[] No.45671214{5}[source]
I don't know for certain what you're referring to, but the "bulk downloads" of the Internet that AI companies are executing for training are the problem I've seen cited, and doesn't relate to LLMs checking their sources at query time.
26. CaptainOfCoit ◴[] No.45671331[source]
> Actually there was a Wikipedia article of this name, but it was deleted in June -- because it was AI generated. Unfortunately AI falls for this much like humans do.

A recent Kurzgesagt goes into the dangers of this, and they found the same thing happening with a concrete example: They were researching a topic, tried using LLMs, found they weren't accurate enough and hallucinated, so they continued doing things the manual way. Then some weeks/months later, they noticed a bunch of YouTube videos that had the very hallucinations they were avoiding, and now their own AI assistants started to use those as sources. Paraphrased/remembered by me, could have some inconsistencies/hallucinations.

https://www.youtube.com/watch?v=_zfN9wnPvU0

27. kenjackson ◴[] No.45671358{3}[source]
I get what your saying. But you are now asking for a level of intelligence and critical thinking that I honestly believe is higher than the average person. I think its absolutely doable, but I also feel like we shouldn't make it sound like the current behavior is abhorrent or somehow indicative of a failure in the technology.
replies(2): >>45671503 #>>45676755 #
28. shinycode ◴[] No.45671491{3}[source]
I don’t trust it at all. I wanted to know if he would be able to explain its own results. Just because it was displaying sources and links made me trust it until I checked and was horrified. I wanted to know if it was old link that broke or changed but no apparently
replies(1): >>45672119 #
29. exe34 ◴[] No.45671503{4}[source]
It's actually great from my point of view - it means we're edging our way into limited superintelligence.
30. ◴[] No.45671537{5}[source]
31. fullofideas ◴[] No.45671575{4}[source]
This is pushing the burden of proof on the society. Basically, asking everyone else to pitch in and improve sources so that ai companies can reference these trust worthy sources.
32. zahlman ◴[] No.45671608{5}[source]
It's bad when they indiscriminately crawl for training, and not ideal (but understandable) to use the Internet to communicate with them (and having online accounts associated with that etc.) rather than running them locally.

It's not bad when they use the Internet at generation time to verify the output.

replies(1): >>45677156 #
33. zahlman ◴[] No.45671654{3}[source]
> many that were written to promote various biased points of view, inadvertently uncritically repeat claims from slanted sources, or mischaracterize claims made in good sources.

Yep.

Including, if not especially, the ones actively worked on by the most active contributors.

The process for vetting sources (both in terms of suitability for a particular article, and general "reliable sources" status) is also seriously problematic. Especially when it comes to any topic which fundamentally relates to the reliability of journalism and the media in general.

34. bunderbunder ◴[] No.45671882{4}[source]
Outsourcing due diligence to a tool (or a single unified source) is the problem, not the solution.

For example, having a single central arbiter of source bias is inescapably the most biased thing you could possibly do. Bias has to be defined within an intellectual paradigm. So you'd have to choose a paradigm to use for that bias evaluation, and de facto declare it to be the one true paradigm for this purpose. But intellectual paradigms are inherently subjective, so doing that is pretty much the most intellectually biased thing you can possibly do.

35. LeifCarrotson ◴[] No.45671893{3}[source]
A future problem will be that the BBC and the rest of the Internet will soon be chock-full of nonsense, with a large proportion of articles that were never fact checked by a human, much less an AI.
36. ekidd ◴[] No.45672029[source]
"Truth" is often a very expensive commodity to obtain. There are plenty of awful sources and mistaken claims on the shelf of any town library. Lots of peer reviewed papers are crap, including a few in Nature. Newspapers are constantly wrong and misleading. Digging through even "reliable" sources can require significant expertise. (This is, in fact, a significant part of PhD training, according to the PhDs and professors I know: Learning to use the literature well.)

One way to successfully use LLMs is to do the initial research legwork. Run the 40 Google searches and follow links. Evaluate sources according to some criteria. Summarize. And then give the human a list of links to follow.

You quickly learn to see patterns. Sonnet will happily give a genuinely useful rule of thumb, phrasing it like it's widely accepted. But the source will turn out to be "one guy on a forum."

There are other tricks that work well. Have the LLM write an initial overview with sources. Tell it strictly limit itself to information in the sources, etc. Then hand the report off to a fresh LLM and tell it to carefully check each citation in the report, removing unsourced information. Then have the human review the output, following links.

None of this will get you guaranteed truth. But if you know what you're doing, it can often give you a better starting point than Wikipedia or anything on the first two pages of Google Search results. Accurate information is genuinely hard to get, and it always has been.

37. macintux ◴[] No.45672119{4}[source]
You said:

>...it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists.

smrq is asking why you would believe that explanation. The LLM doesn't necessarily know why it's doing what it's doing, so that could be another hallucination.

Your answer:

> ...I wanted to know if it was old link that broke or changed but no apparently

Leads me to believe that you misunderstood smrq's question.

replies(1): >>45674436 #
38. ishtanbul ◴[] No.45672162{4}[source]
Maybe we can get AI to do this hard labor
39. edavison1 ◴[] No.45672257{3}[source]
Ran into this the other day researching a brewery. Google AI summary referenced a glowing NYT profile of its beers. The linked article was not in fact about that brewery, but an entirely different one. Brewery I was researching has never been mentioned in the NYT. Complete invention at that point and has 'stolen' the good press from a different place and just fed the user what they wanted to see, namely a recommendation for the thing I was googling.
40. extrabajs ◴[] No.45672359{6}[source]
Statistically significant... sample size? Support the hypothesis?
41. mdhb ◴[] No.45672453{5}[source]
Almost as though it’s not a sustainable business model and relies of tricking people in order to keep the lights on.
42. AlienRobot ◴[] No.45672567[source]
AI-powered citogenesis!
43. aflag ◴[] No.45672812[source]
Existing is just a point in time
44. dingnuts ◴[] No.45673022{4}[source]
I noticed that my local library has a new set of World Book. Maybe it's time to bring back traditional encyclopedias.
45. dingnuts ◴[] No.45673056{4}[source]
Gemini will lie to me when I ask it to cite things, either pull up relevant sources or just hallucinate them.

IDK how you people go through that experience more than a handful of times before you get pissed off and stop using these tools. I've wasted so much time because of believable lies from these bots.

Sorry, not even lies, just bullshit. The model has no conception of truth so it can't even lie. Just outputs bullshit that happens to be true sometimes.

46. hunterpayne ◴[] No.45673271{3}[source]
Wikipedia is pretty good for most topics. Anything even remotely political somewhere however, it isn't just bad, it is one of the worst sources out there. And therein lies the problem, its wildly different levels of quality depending on the topic.
replies(1): >>45674728 #
47. cogman10 ◴[] No.45673869{4}[source]
An example of this.

I've seen a certain sensationalist news source write a story that went like this.

Site A: Bad thing is happening, cite: article Site B

* follow the source *

Site B: Bad thing is happening, cite different article on Site A

* follow the source *

Site A: Bad thing is happening, no citation.

I fear that's the current state of a large news bubble that many people subscribe to. And when these sensationalist stories start circulating there's a natural human tendency to exaggerate.

I don't think AI has any sort of real good defense to this sort of thing. 1 level of citation is already hard enough. Recognizing that it is citing the same source is hard enough.

There was another example from the Kagi news stuff which exemplified this. A whole article written which made 3 citations that were ultimately spawned from the same new briefing published by different outlets.

I've even seen an example of a national political leader who fell for the same sort of sensationalization. One who should have known better. They repeated what was later found to be a lie by a well-known liar but added that "I've seen the photos in a classified debriefing". IDK that it was necessarily even malicious, I think people are just really bad at separating credible from uncredible information and that it ultimately blends together as one thing (certainly doesn't help with ancient politicians).

48. shinycode ◴[] No.45674436{5}[source]
No I got the question, I said that I wanted to see what kind of explanation it would give me. Ofc it can hallucinate that explanation as well. The bottom line is I don’t trust it, and the source link are fake (and not broken or obsolete)
49. ModernMech ◴[] No.45674673{5}[source]
Better make each query count then.
50. ModernMech ◴[] No.45674716[source]
> and only asked to summarize relevant facts from the existing information it can find

Still not enough as I find the LLM will not summarize all the relevant facts, sometimes leaving out the most salient ones. Maybe you'll get a summary of some facts, maybe the ones you explicitly ask for, but you'll be left wondering if the LLM is leaving out important information.

51. mikkupikku ◴[] No.45674728{4}[source]
Wikipedia is bad even for topics that aren't particularly political, not even because the editor was trying to be misleading but rather was being lazy and wrote up their own misconception and either made up a source or pulled a source without bothering to actually read it. These kind of errors can stay in place for years.

I have one example that I check periodically just to see if anybody else has noticed. I've been checking it for several years and it's still there; the SDI page claims that Brilliant Pebbles was designed to use "watermelon sized" tungsten projectiles. This is completely made up; whoever wrote it up was probably confusing "rods from god" proposals that commonly use tungsten and synthesizing that confusion with "pebbles". The sentence is cited but the sources don't back it up. It's been up like this for years. This error has been repeated on many websites now, all post-dating the change on wikipedia.

If you're reading this and are the sort to edit wikipedia.. Don't fix it. That would be cheating.

replies(1): >>45676455 #
52. wahern ◴[] No.45676455{5}[source]
> If you're reading this and are the sort to edit wikipedia.. Don't fix it. That would be cheating.

Imagine if this was the ethos regarding open source software projects. Imaging Microsoft saying 20 years ago, "Linux has this and that bug, but you're not allowed to go fix it because that detracts from our criticism of open source." (Actually, I wouldn't be surprised if Microsoft or similar detractors literally said this.)

Of course Wikipedia has wrong information. Most open source software projects, even the best, have buggy, shite code. But these things are better understood not as products, but as processes, and in many (but not all) contexts the product at any point in time has generally proven, in a broad sense, to outperform their cathedral alternatives. But the process breaks down when pervasive cynicism and nihilism reduce the number of well-intentioned people who positively engage and contribute, rather than complain from the sidelines. Then we land right back to square 0. And maybe you're too young to remember what the world was like at square 0, but it sucked in terms of knowledge accessibility, notwithstanding the small number of outstanding resources--but which were often inaccessible because of cost or other barriers.

53. Paracompact ◴[] No.45676755{4}[source]
The bar for an industry should be the good-faith effort of the average industry professional, not the unconscionably minimal efforts of the average grifter trying to farm content.

These grifters simply were not attracted to these gigs in these quantities prior to AI, but now the market incentives have changed. Should we "blame" the technology for its abuse? I think AI is incredible, but market endorsement is different from intellectual admiration.

54. Dylan16807 ◴[] No.45677156{6}[source]
Also for the most part this verification can use a HEAD request.
55. frm88 ◴[] No.45678070{6}[source]
I don't follow your reasoning re. statistical sample size. The topic article claims that 45% of the answers were wrong. If - with a vastly greater sample size - the answers were "only" (let's say) 20% wrong, that's still a complete failure, so is 5%. The article is not about hypothesis, it's about news reporting.