Most active commenters

menaerus(3)
shinycode(3)

Popular/hot comments

>>45670872 #
>>45670093 #
>>45670184 #
>>45670469 #
>>45670633 #
>>45670006 #

←back to thread

AI assistants misrepresent news content 45% of the time

(www.bbc.co.uk)

Show context

scarmig ◴[22 Oct 25 14:46 UTC] No.45669929[source]▶

>>45668990 (OP) #

If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.

This article contains significant issues.

replies(7): >>45669943 #>>45670942 #>>45671401 #>>45672311 #>>45672577 #>>45675250 #>>45679322 #

1. afavour ◴[22 Oct 25 14:47 UTC] No.45669943[source]▶

>>45669929 #

> or it (shocking) cites Wikipedia instead of the BBC.

No... the problem is that it cites Wikipedia articles that don't exist.

> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

replies(6): >>45670006 #>>45670093 #>>45670094 #>>45670184 #>>45670903 #>>45672812 #

2. scarmig ◴[22 Oct 25 14:51 UTC] No.45670006[source]▶

>>45669943 (TP) #

> Participating organizations raised concerns about responses that relied heavily or solely on Wikipedia content – Radio-Canada calculated that of 108 sources cited in responses from ChatGPT, 58% were from Wikipedia. CBC-Radio-Canada are amongst a number of Canadian media organisations suing ChatGPT’s creator, OpenAI, for copyright infringement. Although the impact of this on ChatGPT’s approach to sourcing is not explicitly known, it may explain the high use of Wikipedia sources.

Also, is attributing, without any citation, ChatGPT's preference for Wikipedia to a reprisal to an active lawsuit a significant issue? Or do the authors get off scot-free because they caged it in "we don't know, but maybe it's the case"?

replies(3): >>45670451 #>>45670541 #>>45671199 #

3. kenjackson ◴[22 Oct 25 14:56 UTC] No.45670093[source]▶

>>45669943 (TP) #

Actually there was a Wikipedia article of this name, but it was deleted in June -- because it was AI generated. Unfortunately AI falls for this much like humans do.

https://en.wikipedia.org/wiki/Wikipedia:Articles_for_deletio...

replies(4): >>45670306 #>>45670779 #>>45671331 #>>45672567 #

4. menaerus ◴[22 Oct 25 14:56 UTC] No.45670094[source]▶

>>45669943 (TP) #

> For the current research, a set of 30 “core” news questions was developed

Right. Let's talk about statistics for a bit. Or let's put it differently: they found in their report that 45% of the answers for 30 questions they have "developed" had a significant issue, e.g. inexisting reference

I'll give you 30 questions out of my sleeve where 95% of the answers will not have any significant issue.

replies(1): >>45670270 #

5. hnuser123456 ◴[22 Oct 25 15:01 UTC] No.45670184[source]▶

>>45669943 (TP) #

Do we have any good research on how much less often larger, newer models will just make stuff up like this? As it is, it's pretty clear LLMs are categorically not a good idea for directly querying for information in any non-fiction-writing context. If you're using an LLM to research something that needs to be accurate, the LLM needs to be doing a tool call to a web search and only asked to summarize relevant facts from the existing information it can find, and have them be cited by hard-coding the UI to link the pages the LLM reviewed. The LLM itself cannot be trusted to generate its own citations. It will just generate something that looks like a relevant citation, along with whatever imaginary content it wants to attribute to this non-existent source.

replies(4): >>45670469 #>>45670908 #>>45672029 #>>45674716 #

6. matthewmacleod ◴[22 Oct 25 15:05 UTC] No.45670270[source]▶

>>45670094 #

Yes, I'm sure you could hack together some bullshit questions to demonstrate whatever you want. Is there a specific reason that the reasonably straightforward methodology they did use is somehow flawed?

replies(1): >>45670445 #

7. Workaccount2 ◴[22 Oct 25 15:08 UTC] No.45670306[source]▶

>>45670093 #

This is likely because of the knowledge cutoff.

I have seen a few cases before of "hallucinations" that turned out to be things that did exist, but no longer do.

replies(1): >>45670633 #

8. menaerus ◴[22 Oct 25 15:16 UTC] No.45670445{3}[source]▶

>>45670270 #

Yes, and you answered it yourself.

replies(1): >>45670661 #

9. ◴[22 Oct 25 15:16 UTC] No.45670451[source]▶

>>45670006 #

10. jacobolus ◴[22 Oct 25 15:18 UTC] No.45670469[source]▶

>>45670184 #

A further problem is that Wikipedia is chock full of nonsense, with a large proportion of articles that were never fact checked by an expert, and many that were written to promote various biased points of view, inadvertently uncritically repeat claims from slanted sources, or mischaracterize claims made in good sources. Many if not most articles have poor choice of emphasis of subtopics, omit important basic topics, and make routine factual errors. (This problem is not unique to Wikipedia by any means, and despite its flaws Wikipedia is an amazing achievement.)

A critical human reader can go as deep as they like in examining claims there: can look at the source listed for a claim, can often click through to read the claim in the source, can examine the talk page and article history, can search through the research literature trying to figure out where the claim came from or how it mutated in passing from source to source, etc. But an AI "reader" is a predictive statistical model, not a critical consumer of information.

replies(4): >>45671016 #>>45671654 #>>45671893 #>>45673271 #

11. ffsm8 ◴[22 Oct 25 15:23 UTC] No.45670541[source]▶

>>45670006 #

Literally constantly? It takes both careful prompting and throughout double-checking to really notice however. Because often the links also exist, just don't represent what the LLM made it sound like.

And the worst part about the people unironically thinking they can use it for "research" is, that it essentially supercharges confirmation bias.

The inefficient sidequests you do while researching is generally what actually gives you the ability to really reason about a topic.

If you instead just laser focus on the tidbits you prompted with... Well, your opinion is a lot less grounded.

replies(1): >>45672257 #

12. 1980phipsi ◴[22 Oct 25 15:29 UTC] No.45670633{3}[source]▶

>>45670306 #

The fix for this is for the AI to double-check all links before providing them to the user. I frequently ask ChatGPT to double check that references actually exist when it gives me them. It should be built in!

replies(4): >>45670762 #>>45670808 #>>45670935 #>>45673056 #

13. darkwater ◴[22 Oct 25 15:31 UTC] No.45670661{4}[source]▶

>>45670445 #

Err, no? Being _possible_ does not necessarily imply that's what happened.

replies(1): >>45670934 #

14. rideontime ◴[22 Oct 25 15:37 UTC] No.45670762{4}[source]▶

>>45670633 #

But that would mean OpenAI would lose even more money on every query.

replies(2): >>45672453 #>>45674673 #

15. bunderbunder ◴[22 Oct 25 15:39 UTC] No.45670779[source]▶

>>45670093 #

The biggest problem with that citation isn't that the article has since been deleted. The biggest problem is that that particular Wikipedia article was never a good source in the first place.

That seems to be the real challenge with AI for this use case. It has no real critical thinking skills, so it's not really competent to choose reliable sources. So instead we're lowering the bar to just asking that the sources actually exist. I really hate that. We shouldn't be lowering intellectual standards to meet AI where it's at. These intellectual standards are important and hard-won, and we need to be demanding that AI be the one to rise to meet them.

replies(2): >>45670872 #>>45671358 #

16. blitzar ◴[22 Oct 25 15:40 UTC] No.45670808{4}[source]▶

>>45670633 #

I have found my self doing the same "citation needed" loop - but with ai this is a dangerous game as it will now double down on whatever it made up and go looking for citations to justify its answer.

Pre prompting to cite sources is obviously a better way of going about things.

replies(1): >>45671537 #

17. gamerDude ◴[22 Oct 25 15:44 UTC] No.45670872{3}[source]▶

>>45670779 #

I think this is a real challenge for everyone. In many ways potentially we need a restart of a wikipedia like site to document all the valid and good sources. This would also hopefully include things like source bias and whether it's a primary/secondary/tertiary source.

replies(5): >>45671575 #>>45671882 #>>45672162 #>>45673022 #>>45673869 #

18. shinycode ◴[22 Oct 25 15:45 UTC] No.45670903[source]▶

>>45669943 (TP) #

I used perplexity for searches and I clicked on all sources that were given. Depending on the model used from 100% to 20% of the urls I tested did not exist. I kept on querying the LLM about it and it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists. Useless.

replies(1): >>45670989 #

19. bigbuppo ◴[22 Oct 25 15:45 UTC] No.45670908[source]▶

>>45670184 #

The problem is that people are using it as a substitute for a web search, and the web search company has decided to kill off search as a product and pivot to video, err, I mean pivot to AI chatbots so hard they replaced one of the common ways to access emergency services on their mobile phones with an AI chatbot that can't help you in an emergency.

Not to mention, the AI companies have been extremely abusive to the rest of the internet so they are often blocked from accessing various web sites, so it's not like they're going to be able to access legitimate information anyways.

20. menaerus ◴[22 Oct 25 15:47 UTC] No.45670934{5}[source]▶

>>45670661 #

A bucket of 30 questions is not a statistically significant sample size which we can use to support the hypothesis which goes to say that all AI assistants they tested are 45% of the time wrong. That's not how science works.

Neither is my bucket of 30 questions statistcally significant but it goes to say that I can disprove their hypothesis just by giving them my sample.

I think that the report is being disingenious and I don't understand for what reasons. it's funny that they say "misrepresent" when that's exactly what they are doing.

replies(2): >>45672359 #>>45678070 #

21. janwl ◴[22 Oct 25 15:47 UTC] No.45670935{4}[source]▶

>>45670633 #

I thought people here hated it when LLMs made http requests?

replies(2): >>45671214 #>>45671608 #

22. smrq ◴[22 Oct 25 15:50 UTC] No.45670989[source]▶

>>45670903 #

I share your opinion on the results, but why would you trust the LLM explanation for why it does what it does?

replies(1): >>45671491 #

23. senderista ◴[22 Oct 25 15:52 UTC] No.45671016{3}[source]▶

>>45670469 #

Just the other day, I clicked through to a Wikipedia reference (a news article) and discovered that the citing sentence grossly misrepresented the source. Probably not accidental since it was about a politically charged subject.

24. terminalshort ◴[22 Oct 25 16:04 UTC] No.45671199[source]▶

>>45670006 #

It's a huge issue. No wonder AI hallucinates when it trains on this kind of crap.

25. macintux ◴[22 Oct 25 16:04 UTC] No.45671214{5}[source]▶

>>45670935 #

I don't know for certain what you're referring to, but the "bulk downloads" of the Internet that AI companies are executing for training are the problem I've seen cited, and doesn't relate to LLMs checking their sources at query time.

26. CaptainOfCoit ◴[22 Oct 25 16:11 UTC] No.45671331[source]▶

>>45670093 #

> Actually there was a Wikipedia article of this name, but it was deleted in June -- because it was AI generated. Unfortunately AI falls for this much like humans do.

A recent Kurzgesagt goes into the dangers of this, and they found the same thing happening with a concrete example: They were researching a topic, tried using LLMs, found they weren't accurate enough and hallucinated, so they continued doing things the manual way. Then some weeks/months later, they noticed a bunch of YouTube videos that had the very hallucinations they were avoiding, and now their own AI assistants started to use those as sources. Paraphrased/remembered by me, could have some inconsistencies/hallucinations.

https://www.youtube.com/watch?v=_zfN9wnPvU0

27. kenjackson ◴[22 Oct 25 16:13 UTC] No.45671358{3}[source]▶

>>45670779 #

I get what your saying. But you are now asking for a level of intelligence and critical thinking that I honestly believe is higher than the average person. I think its absolutely doable, but I also feel like we shouldn't make it sound like the current behavior is abhorrent or somehow indicative of a failure in the technology.

replies(2): >>45671503 #>>45676755 #

28. shinycode ◴[22 Oct 25 16:22 UTC] No.45671491{3}[source]▶

>>45670989 #

I don’t trust it at all. I wanted to know if he would be able to explain its own results. Just because it was displaying sources and links made me trust it until I checked and was horrified. I wanted to know if it was old link that broke or changed but no apparently

replies(1): >>45672119 #

29. exe34 ◴[22 Oct 25 16:23 UTC] No.45671503{4}[source]▶

>>45671358 #

It's actually great from my point of view - it means we're edging our way into limited superintelligence.

30. ◴[22 Oct 25 16:25 UTC] No.45671537{5}[source]▶

>>45670808 #

31. fullofideas ◴[22 Oct 25 16:27 UTC] No.45671575{4}[source]▶

>>45670872 #

This is pushing the burden of proof on the society. Basically, asking everyone else to pitch in and improve sources so that ai companies can reference these trust worthy sources.

32. zahlman ◴[22 Oct 25 16:30 UTC] No.45671608{5}[source]▶

>>45670935 #

It's bad when they indiscriminately crawl for training, and not ideal (but understandable) to use the Internet to communicate with them (and having online accounts associated with that etc.) rather than running them locally.

It's not bad when they use the Internet at generation time to verify the output.

replies(1): >>45677156 #

33. zahlman ◴[22 Oct 25 16:34 UTC] No.45671654{3}[source]▶

>>45670469 #

> many that were written to promote various biased points of view, inadvertently uncritically repeat claims from slanted sources, or mischaracterize claims made in good sources.

Yep.

Including, if not especially, the ones actively worked on by the most active contributors.

The process for vetting sources (both in terms of suitability for a particular article, and general "reliable sources" status) is also seriously problematic. Especially when it comes to any topic which fundamentally relates to the reliability of journalism and the media in general.

34. bunderbunder ◴[22 Oct 25 16:51 UTC] No.45671882{4}[source]▶

>>45670872 #

Outsourcing due diligence to a tool (or a single unified source) is the problem, not the solution.

For example, having a single central arbiter of source bias is inescapably the most biased thing you could possibly do. Bias has to be defined within an intellectual paradigm. So you'd have to choose a paradigm to use for that bias evaluation, and de facto declare it to be the one true paradigm for this purpose. But intellectual paradigms are inherently subjective, so doing that is pretty much the most intellectually biased thing you can possibly do.

35. LeifCarrotson ◴[22 Oct 25 16:51 UTC] No.45671893{3}[source]▶

>>45670469 #

A future problem will be that the BBC and the rest of the Internet will soon be chock-full of nonsense, with a large proportion of articles that were never fact checked by a human, much less an AI.

36. ekidd ◴[22 Oct 25 17:01 UTC] No.45672029[source]▶

>>45670184 #

"Truth" is often a very expensive commodity to obtain. There are plenty of awful sources and mistaken claims on the shelf of any town library. Lots of peer reviewed papers are crap, including a few in Nature. Newspapers are constantly wrong and misleading. Digging through even "reliable" sources can require significant expertise. (This is, in fact, a significant part of PhD training, according to the PhDs and professors I know: Learning to use the literature well.)

One way to successfully use LLMs is to do the initial research legwork. Run the 40 Google searches and follow links. Evaluate sources according to some criteria. Summarize. And then give the human a list of links to follow.

You quickly learn to see patterns. Sonnet will happily give a genuinely useful rule of thumb, phrasing it like it's widely accepted. But the source will turn out to be "one guy on a forum."

There are other tricks that work well. Have the LLM write an initial overview with sources. Tell it strictly limit itself to information in the sources, etc. Then hand the report off to a fresh LLM and tell it to carefully check each citation in the report, removing unsourced information. Then have the human review the output, following links.

None of this will get you guaranteed truth. But if you know what you're doing, it can often give you a better starting point than Wikipedia or anything on the first two pages of Google Search results. Accurate information is genuinely hard to get, and it always has been.

37. macintux ◴[22 Oct 25 17:08 UTC] No.45672119{4}[source]▶

>>45671491 #

You said:

>...it finally told me that it generated « the most probable » urls for the topic in question based on the ones he knows exists.

smrq is asking why you would believe that explanation. The LLM doesn't necessarily know why it's doing what it's doing, so that could be another hallucination.

Your answer:

> ...I wanted to know if it was old link that broke or changed but no apparently

Leads me to believe that you misunderstood smrq's question.

replies(1): >>45674436 #

38. ishtanbul ◴[22 Oct 25 17:13 UTC] No.45672162{4}[source]▶

>>45670872 #

Maybe we can get AI to do this hard labor

39. edavison1 ◴[22 Oct 25 17:20 UTC] No.45672257{3}[source]▶

>>45670541 #

Ran into this the other day researching a brewery. Google AI summary referenced a glowing NYT profile of its beers. The linked article was not in fact about that brewery, but an entirely different one. Brewery I was researching has never been mentioned in the NYT. Complete invention at that point and has 'stolen' the good press from a different place and just fed the user what they wanted to see, namely a recommendation for the thing I was googling.

40. extrabajs ◴[22 Oct 25 17:28 UTC] No.45672359{6}[source]▶

>>45670934 #

Statistically significant... sample size? Support the hypothesis?

41. mdhb ◴[22 Oct 25 17:34 UTC] No.45672453{5}[source]▶

>>45670762 #

Almost as though it’s not a sustainable business model and relies of tricking people in order to keep the lights on.

42. AlienRobot ◴[22 Oct 25 17:42 UTC] No.45672567[source]▶

>>45670093 #

AI-powered citogenesis!

43. aflag ◴[22 Oct 25 17:58 UTC] No.45672812[source]▶

>>45669943 (TP) #

Existing is just a point in time

44. dingnuts ◴[22 Oct 25 18:13 UTC] No.45673022{4}[source]▶

>>45670872 #

I noticed that my local library has a new set of World Book. Maybe it's time to bring back traditional encyclopedias.

45. dingnuts ◴[22 Oct 25 18:16 UTC] No.45673056{4}[source]▶

>>45670633 #

Gemini will lie to me when I ask it to cite things, either pull up relevant sources or just hallucinate them.

IDK how you people go through that experience more than a handful of times before you get pissed off and stop using these tools. I've wasted so much time because of believable lies from these bots.

Sorry, not even lies, just bullshit. The model has no conception of truth so it can't even lie. Just outputs bullshit that happens to be true sometimes.

46. hunterpayne ◴[22 Oct 25 18:33 UTC] No.45673271{3}[source]▶

>>45670469 #

Wikipedia is pretty good for most topics. Anything even remotely political somewhere however, it isn't just bad, it is one of the worst sources out there. And therein lies the problem, its wildly different levels of quality depending on the topic.

replies(1): >>45674728 #

47. cogman10 ◴[22 Oct 25 19:19 UTC] No.45673869{4}[source]▶

>>45670872 #

An example of this.

I've seen a certain sensationalist news source write a story that went like this.

Site A: Bad thing is happening, cite: article Site B

* follow the source *

Site B: Bad thing is happening, cite different article on Site A

* follow the source *

Site A: Bad thing is happening, no citation.

I fear that's the current state of a large news bubble that many people subscribe to. And when these sensationalist stories start circulating there's a natural human tendency to exaggerate.

I don't think AI has any sort of real good defense to this sort of thing. 1 level of citation is already hard enough. Recognizing that it is citing the same source is hard enough.

There was another example from the Kagi news stuff which exemplified this. A whole article written which made 3 citations that were ultimately spawned from the same new briefing published by different outlets.

I've even seen an example of a national political leader who fell for the same sort of sensationalization. One who should have known better. They repeated what was later found to be a lie by a well-known liar but added that "I've seen the photos in a classified debriefing". IDK that it was necessarily even malicious, I think people are just really bad at separating credible from uncredible information and that it ultimately blends together as one thing (certainly doesn't help with ancient politicians).

48. shinycode ◴[22 Oct 25 20:04 UTC] No.45674436{5}[source]▶

>>45672119 #

No I got the question, I said that I wanted to see what kind of explanation it would give me. Ofc it can hallucinate that explanation as well. The bottom line is I don’t trust it, and the source link are fake (and not broken or obsolete)

49. ModernMech ◴[22 Oct 25 20:22 UTC] No.45674673{5}[source]▶

>>45670762 #

Better make each query count then.

50. ModernMech ◴[22 Oct 25 20:26 UTC] No.45674716[source]▶

>>45670184 #

> and only asked to summarize relevant facts from the existing information it can find

Still not enough as I find the LLM will not summarize all the relevant facts, sometimes leaving out the most salient ones. Maybe you'll get a summary of some facts, maybe the ones you explicitly ask for, but you'll be left wondering if the LLM is leaving out important information.

51. mikkupikku ◴[22 Oct 25 20:27 UTC] No.45674728{4}[source]▶

>>45673271 #

Wikipedia is bad even for topics that aren't particularly political, not even because the editor was trying to be misleading but rather was being lazy and wrote up their own misconception and either made up a source or pulled a source without bothering to actually read it. These kind of errors can stay in place for years.

I have one example that I check periodically just to see if anybody else has noticed. I've been checking it for several years and it's still there; the SDI page claims that Brilliant Pebbles was designed to use "watermelon sized" tungsten projectiles. This is completely made up; whoever wrote it up was probably confusing "rods from god" proposals that commonly use tungsten and synthesizing that confusion with "pebbles". The sentence is cited but the sources don't back it up. It's been up like this for years. This error has been repeated on many websites now, all post-dating the change on wikipedia.

If you're reading this and are the sort to edit wikipedia.. Don't fix it. That would be cheating.

replies(1): >>45676455 #

52. wahern ◴[22 Oct 25 23:28 UTC] No.45676455{5}[source]▶

>>45674728 #

> If you're reading this and are the sort to edit wikipedia.. Don't fix it. That would be cheating.

Imagine if this was the ethos regarding open source software projects. Imaging Microsoft saying 20 years ago, "Linux has this and that bug, but you're not allowed to go fix it because that detracts from our criticism of open source." (Actually, I wouldn't be surprised if Microsoft or similar detractors literally said this.)

Of course Wikipedia has wrong information. Most open source software projects, even the best, have buggy, shite code. But these things are better understood not as products, but as processes, and in many (but not all) contexts the product at any point in time has generally proven, in a broad sense, to outperform their cathedral alternatives. But the process breaks down when pervasive cynicism and nihilism reduce the number of well-intentioned people who positively engage and contribute, rather than complain from the sidelines. Then we land right back to square 0. And maybe you're too young to remember what the world was like at square 0, but it sucked in terms of knowledge accessibility, notwithstanding the small number of outstanding resources--but which were often inaccessible because of cost or other barriers.

53. Paracompact ◴[23 Oct 25 00:13 UTC] No.45676755{4}[source]▶

>>45671358 #

The bar for an industry should be the good-faith effort of the average industry professional, not the unconscionably minimal efforts of the average grifter trying to farm content.

These grifters simply were not attracted to these gigs in these quantities prior to AI, but now the market incentives have changed. Should we "blame" the technology for its abuse? I think AI is incredible, but market endorsement is different from intellectual admiration.

54. Dylan16807 ◴[23 Oct 25 01:22 UTC] No.45677156{6}[source]▶

>>45671608 #

Also for the most part this verification can use a HEAD request.

55. frm88 ◴[23 Oct 25 04:05 UTC] No.45678070{6}[source]▶

>>45670934 #

I don't follow your reasoning re. statistical sample size. The topic article claims that 45% of the answers were wrong. If - with a vastly greater sample size - the answers were "only" (let's say) 20% wrong, that's still a complete failure, so is 5%. The article is not about hypothesis, it's about news reporting.

↑