Over fifty new hallucinations in ICLR 2026 submissions

(gptzero.me)

504 points puttycat | 4 comments | 07 Dec 25 13:16 UTC | HN request time: 0s | source

Show context

jameshart ◴[07 Dec 25 14:50 UTC] No.46182056[source]▶

Is the baseline assumption of this work that an erroneous citation is LLM hallucinated?

Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?

replies(5): >>46182229 #>>46182238 #>>46182245 #>>46182375 #>>46186305 #

llm_nerd ◴[07 Dec 25 15:11 UTC] No.46182238[source]▶

>>46182056 #

People will commonly hold LLMs as unusable because they make mistakes. So do people. Books have errors. Papers have errors. People have flawed knowledge, often degraded through a conceptual game of telephone.

Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.

People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.

replies(4): >>46182279 #>>46182296 #>>46182511 #>>46184858 #

1. the_af ◴[07 Dec 25 15:18 UTC] No.46182296[source]▶

>>46182238 #

LLM are a force multiplier of this kind of errors though. It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.

replies(1): >>46182365 #

2. llm_nerd ◴[07 Dec 25 15:26 UTC] No.46182365[source]▶

>>46182296 (TP) #

>It's not easy to hallucinate papers out of whole cloth, but LLMs can easily and confidently do it, quote paragraphs that don't exist, and do it tirelessly and at a pace unmatched by humans.

But no one is claiming these papers were hallucinated whole, so I don't see how that's relevant. This study -- notably to sell an "AI detector", which is largely a laughable snake-oil field -- looked purely at the accuracy of citations[1] among a very large set of citations. Errors in papers are not remotely uncommon, and finding some errors is...exactly what one would expect. As the GP said, do the same study on pre-LLM papers and you'll find an enormous number of incorrect if not fabricated citations. Peer review has always been an illusion of auditing.

1 - Which is such a weird thing to sell an "AI detection" tool. Clearly it was mostly manual given that they somehow only managed to check a tiny subset of the papers, so in all likelihood was some guy going through citations and checking them on Google Search.

replies(1): >>46182921 #

3. the_af ◴[07 Dec 25 16:32 UTC] No.46182921[source]▶

>>46182365 #

I've zero interest in the AI tool, I'm discussing the broader problem.

The references were made up, and this is easier and faster to do with LLMs than with humans. Easier to do inadvertently, too.

As I said, LLMs are a force multiplier for fraud and inadvertent errors. So it's a big deal.

replies(1): >>46183751 #

4. throwaway-0001 ◴[07 Dec 25 18:18 UTC] No.46183751{3}[source]▶

>>46182921 #

I think we should see a chart as % of “fabricated” references from past 20 years. We should see a huge increase after 2020-2021. Anyone has this chart data?

↑