AI assistants misrepresent news content 45% of the time

(www.bbc.co.uk)

423 points sohkamyung | 1 comments | 22 Oct 25 13:39 UTC | HN request time: 0.262s | source

Show context

scarmig ◴[22 Oct 25 14:46 UTC] No.45669929[source]▶

If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.

This article contains significant issues.

replies(7): >>45669943 #>>45670942 #>>45671401 #>>45672311 #>>45672577 #>>45675250 #>>45679322 #

scellus ◴[22 Oct 25 16:16 UTC] No.45671401[source]▶

>>45669929 #

Are citation issues related to the fact that https://www.bbc.co.uk/robots.txt denies a lot of AI, both user agents and crawlers?

replies(1): >>45671888 #

scarmig ◴[22 Oct 25 16:51 UTC] No.45671888[source]▶

>>45671401 #

The report says that different media organizations dropped their robots.txt for the duration of the research to give LLMs access.

I would expect this isn't the on-off switch they conceptualized, but I don't know enough about how different LLM providers handle news search and retrieval to say for sure.

replies(1): >>45672563 #

dylan604 ◴[22 Oct 25 17:42 UTC] No.45672563[source]▶

>>45671888 #

Does it work like that though? How long does it take for AI bots to crawl sites and have the data added to the model currently being used? Am I wrong in thinking that it takes a lot longer for AI bot crawls to be available to the public than a typical search engine crawler?

replies(1): >>45673352 #

rimeice ◴[22 Oct 25 18:38 UTC] No.45673352[source]▶

>>45672563 #

Bots could be crawlers gathering data to periodically be used as raw training data or the requests could just be from a web search agent of some form like ChatGPT finding latest news stories on topic X for example. I don’t know if robots.txt can distinguish between the two types of bot request or whether LLM providers even adhere to either.

replies(1): >>45673556 #

1. jay_kyburz ◴[22 Oct 25 18:54 UTC] No.45673556[source]▶

>>45673352 #

Wow, Just reading the headline I had assumed they were giving the new article as a document, then asking it to summarize the the document given.

↑