←back to thread

421 points sohkamyung | 1 comments | | HN request time: 0.001s | source
Show context
scarmig ◴[] No.45669929[source]
If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.

This article contains significant issues.

replies(7): >>45669943 #>>45670942 #>>45671401 #>>45672311 #>>45672577 #>>45675250 #>>45679322 #
afavour ◴[] No.45669943[source]
> or it (shocking) cites Wikipedia instead of the BBC.

No... the problem is that it cites Wikipedia articles that don't exist.

> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

replies(6): >>45670006 #>>45670093 #>>45670094 #>>45670184 #>>45670903 #>>45672812 #
hnuser123456 ◴[] No.45670184[source]
Do we have any good research on how much less often larger, newer models will just make stuff up like this? As it is, it's pretty clear LLMs are categorically not a good idea for directly querying for information in any non-fiction-writing context. If you're using an LLM to research something that needs to be accurate, the LLM needs to be doing a tool call to a web search and only asked to summarize relevant facts from the existing information it can find, and have them be cited by hard-coding the UI to link the pages the LLM reviewed. The LLM itself cannot be trusted to generate its own citations. It will just generate something that looks like a relevant citation, along with whatever imaginary content it wants to attribute to this non-existent source.
replies(4): >>45670469 #>>45670908 #>>45672029 #>>45674716 #
jacobolus ◴[] No.45670469[source]
A further problem is that Wikipedia is chock full of nonsense, with a large proportion of articles that were never fact checked by an expert, and many that were written to promote various biased points of view, inadvertently uncritically repeat claims from slanted sources, or mischaracterize claims made in good sources. Many if not most articles have poor choice of emphasis of subtopics, omit important basic topics, and make routine factual errors. (This problem is not unique to Wikipedia by any means, and despite its flaws Wikipedia is an amazing achievement.)

A critical human reader can go as deep as they like in examining claims there: can look at the source listed for a claim, can often click through to read the claim in the source, can examine the talk page and article history, can search through the research literature trying to figure out where the claim came from or how it mutated in passing from source to source, etc. But an AI "reader" is a predictive statistical model, not a critical consumer of information.

replies(4): >>45671016 #>>45671654 #>>45671893 #>>45673271 #
1. zahlman ◴[] No.45671654[source]
> many that were written to promote various biased points of view, inadvertently uncritically repeat claims from slanted sources, or mischaracterize claims made in good sources.

Yep.

Including, if not especially, the ones actively worked on by the most active contributors.

The process for vetting sources (both in terms of suitability for a particular article, and general "reliable sources" status) is also seriously problematic. Especially when it comes to any topic which fundamentally relates to the reliability of journalism and the media in general.