Over fifty new hallucinations in ICLR 2026 submissions

(gptzero.me)

504 points puttycat | 4 comments | 07 Dec 25 13:16 UTC | HN request time: 0.627s | source

Show context

theoldgreybeard ◴[07 Dec 25 15:09 UTC] No.46182214[source]▶

If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.

If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.

AI is not the problem, laziness and negligence is. There needs to be serious social consequences to this kind of thing, otherwise we are tacitly endorsing it.

replies(37): >>46182289 #>>46182330 #>>46182334 #>>46182385 #>>46182388 #>>46182401 #>>46182463 #>>46182527 #>>46182613 #>>46182714 #>>46182766 #>>46182839 #>>46182944 #>>46183118 #>>46183119 #>>46183265 #>>46183341 #>>46183343 #>>46183387 #>>46183435 #>>46183436 #>>46183490 #>>46183571 #>>46183613 #>>46183846 #>>46183911 #>>46183917 #>>46183923 #>>46183940 #>>46184450 #>>46184551 #>>46184653 #>>46184796 #>>46185025 #>>46185817 #>>46185849 #>>46189343 #

CapitalistCartr ◴[07 Dec 25 15:29 UTC] No.46182385[source]▶

>>46182214 #

I'm an industrial electrician. A lot of poor electrical work is visible only to a fellow electrician, and sometimes only another industrial electrician. Bad technical work requires technical inspectors to criticize. Sometimes highly skilled ones.

replies(5): >>46182431 #>>46182828 #>>46183216 #>>46184370 #>>46184518 #

andy99 ◴[07 Dec 25 15:35 UTC] No.46182431[source]▶

>>46182385 #

I’ve reviewed a lot of papers, I don’t consider it the reviewers responsibility to manually verify all citations are real. If there was an unusual citation that was relied on heavily for the basis of the work, one would expect it to be checked. Things like broad prior work, you’d just assume it’s part of background.

The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document. They are also assuming good faith.

replies(14): >>46182472 #>>46182485 #>>46182508 #>>46182513 #>>46182594 #>>46182744 #>>46182769 #>>46183010 #>>46183317 #>>46183396 #>>46183881 #>>46183895 #>>46184147 #>>46186438 #

grayhatter ◴[07 Dec 25 15:53 UTC] No.46182594[source]▶

>>46182431 #

> The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document.

I've always assumed peer review is similar to diff review. Where I'm willing to sign my name onto the work of others. If I approve a diff/pr and it takes down prod. It's just as much my fault, no?

> They are also assuming good faith.

I can only relate this to code review, but assuming good faith means you assume they didn't try to introduce a bug by adding this dependency. But I would should still check to make sure this new dep isn't some typosquatted package. That's the rigor I'm responsible for.

replies(6): >>46182658 #>>46182670 #>>46182685 #>>46182824 #>>46183276 #>>46183298 #

tpoacher ◴[07 Dec 25 16:02 UTC] No.46182670[source]▶

>>46182594 #

This is true, but here the equivalent situation is someone using a greek question mark (";") instead of a semicolon (";"), and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.

Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.

So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.

replies(3): >>46182739 #>>46182753 #>>46183029 #

xvilka ◴[07 Dec 25 16:45 UTC] No.46183029[source]▶

>>46182670 #

Code correctness should be checked automatically with the CI and testsuite. New tests should be added. This is exactly what makes sure these stupid errors don't bother the reviewer. Same for the code formatting and documentation.

replies(2): >>46183132 #>>46183140 #

1. merely-unlikely ◴[07 Dec 25 17:02 UTC] No.46183140[source]▶

>>46183029 #

This discussion makes me think peer reviews need more automated tooling somewhat analogous to what software engineers have long relied on. For example, a tool could use an LLM to check that the citation actually substantiates the claim the paper says it does, or else flags the claim for review.

replies(2): >>46183266 #>>46186350 #

2. noitpmeder ◴[07 Dec 25 17:15 UTC] No.46183266[source]▶

>>46183140 (TP) #

I'd go one further and say all published papers should come with a clear list of "claimed truths", and one is only able to cite said paper if they are linking in to an explicit truth.

Then you can build a true hierarchy of citation dependencies, checked 'statically', and have better indications of impact if a fundamental truth is disproven, ...

replies(1): >>46184618 #

3. vkou ◴[07 Dec 25 20:02 UTC] No.46184618[source]▶

>>46183266 #

Have you authored a lot of non-CS papers?

Could you provide a proof of concept paper for that sort of thing? Not a toy example, an actual example, derived from messy real-world data, in a non-trivial[1] field?

---

[1] Any field is non-trivial when you get deep enough into it.

4. alexcdot ◴[07 Dec 25 23:12 UTC] No.46186350[source]▶

>>46183140 (TP) #

hey, i'm a part of the gptzero team that built automated tooling, to get the results in that article!

totally agree with your thinking here, we can't just give this to an LLM, because of the need to have industry-specific standards for what is a hallucination / match, and how to do the search

↑