←back to thread

423 points sohkamyung | 1 comments | | HN request time: 0s | source
Show context
scarmig ◴[] No.45669929[source]
If you dig into the actual report (I know, I know, how passe), you see how they get the numbers. Most of the errors are "sourcing issues": the AI assistant doesn't cite a claim, or it (shocking) cites Wikipedia instead of the BBC.

Other issues: the report doesn't even say which particular models it's querying [ETA: discovered they do list this in an appendix], aside from saying it's the consumer tier. And it leaves off Anthropic (in my experience, by far the best at this type of task), favoring Perplexity and (perplexingly) Copilot. The article also intermingles claims from the recent report and the one on research conducted a year ago, leaving out critical context that... things have changed.

This article contains significant issues.

replies(7): >>45669943 #>>45670942 #>>45671401 #>>45672311 #>>45672577 #>>45675250 #>>45679322 #
afavour ◴[] No.45669943[source]
> or it (shocking) cites Wikipedia instead of the BBC.

No... the problem is that it cites Wikipedia articles that don't exist.

> ChatGPT linked to a non-existent Wikipedia article on the “European Union Enlargement Goals for 2040”. In fact, there is no official EU policy under that name. The response hallucinates a URL but also, indirectly, an EU goal and policy.

replies(6): >>45670006 #>>45670093 #>>45670094 #>>45670184 #>>45670903 #>>45672812 #
hnuser123456 ◴[] No.45670184[source]
Do we have any good research on how much less often larger, newer models will just make stuff up like this? As it is, it's pretty clear LLMs are categorically not a good idea for directly querying for information in any non-fiction-writing context. If you're using an LLM to research something that needs to be accurate, the LLM needs to be doing a tool call to a web search and only asked to summarize relevant facts from the existing information it can find, and have them be cited by hard-coding the UI to link the pages the LLM reviewed. The LLM itself cannot be trusted to generate its own citations. It will just generate something that looks like a relevant citation, along with whatever imaginary content it wants to attribute to this non-existent source.
replies(4): >>45670469 #>>45670908 #>>45672029 #>>45674716 #
1. ekidd ◴[] No.45672029[source]
"Truth" is often a very expensive commodity to obtain. There are plenty of awful sources and mistaken claims on the shelf of any town library. Lots of peer reviewed papers are crap, including a few in Nature. Newspapers are constantly wrong and misleading. Digging through even "reliable" sources can require significant expertise. (This is, in fact, a significant part of PhD training, according to the PhDs and professors I know: Learning to use the literature well.)

One way to successfully use LLMs is to do the initial research legwork. Run the 40 Google searches and follow links. Evaluate sources according to some criteria. Summarize. And then give the human a list of links to follow.

You quickly learn to see patterns. Sonnet will happily give a genuinely useful rule of thumb, phrasing it like it's widely accepted. But the source will turn out to be "one guy on a forum."

There are other tricks that work well. Have the LLM write an initial overview with sources. Tell it strictly limit itself to information in the sources, etc. Then hand the report off to a fresh LLM and tell it to carefully check each citation in the report, removing unsourced information. Then have the human review the output, following links.

None of this will get you guaranteed truth. But if you know what you're doing, it can often give you a better starting point than Wikipedia or anything on the first two pages of Google Search results. Accurate information is genuinely hard to get, and it always has been.