Over fifty new hallucinations in ICLR 2026 submissions

(gptzero.me)

504 points puttycat | 4 comments | 07 Dec 25 13:16 UTC | HN request time: 0s | source

Show context

theoldgreybeard ◴[07 Dec 25 15:09 UTC] No.46182214[source]▶

If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.

If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.

AI is not the problem, laziness and negligence is. There needs to be serious social consequences to this kind of thing, otherwise we are tacitly endorsing it.

replies(37): >>46182289 #>>46182330 #>>46182334 #>>46182385 #>>46182388 #>>46182401 #>>46182463 #>>46182527 #>>46182613 #>>46182714 #>>46182766 #>>46182839 #>>46182944 #>>46183118 #>>46183119 #>>46183265 #>>46183341 #>>46183343 #>>46183387 #>>46183435 #>>46183436 #>>46183490 #>>46183571 #>>46183613 #>>46183846 #>>46183911 #>>46183917 #>>46183923 #>>46183940 #>>46184450 #>>46184551 #>>46184653 #>>46184796 #>>46185025 #>>46185817 #>>46185849 #>>46189343 #

CapitalistCartr ◴[07 Dec 25 15:29 UTC] No.46182385[source]▶

>>46182214 #

I'm an industrial electrician. A lot of poor electrical work is visible only to a fellow electrician, and sometimes only another industrial electrician. Bad technical work requires technical inspectors to criticize. Sometimes highly skilled ones.

replies(5): >>46182431 #>>46182828 #>>46183216 #>>46184370 #>>46184518 #

andy99 ◴[07 Dec 25 15:35 UTC] No.46182431[source]▶

>>46182385 #

I’ve reviewed a lot of papers, I don’t consider it the reviewers responsibility to manually verify all citations are real. If there was an unusual citation that was relied on heavily for the basis of the work, one would expect it to be checked. Things like broad prior work, you’d just assume it’s part of background.

The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document. They are also assuming good faith.

replies(14): >>46182472 #>>46182485 #>>46182508 #>>46182513 #>>46182594 #>>46182744 #>>46182769 #>>46183010 #>>46183317 #>>46183396 #>>46183881 #>>46183895 #>>46184147 #>>46186438 #

grayhatter ◴[07 Dec 25 15:53 UTC] No.46182594[source]▶

>>46182431 #

> The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document.

I've always assumed peer review is similar to diff review. Where I'm willing to sign my name onto the work of others. If I approve a diff/pr and it takes down prod. It's just as much my fault, no?

> They are also assuming good faith.

I can only relate this to code review, but assuming good faith means you assume they didn't try to introduce a bug by adding this dependency. But I would should still check to make sure this new dep isn't some typosquatted package. That's the rigor I'm responsible for.

replies(6): >>46182658 #>>46182670 #>>46182685 #>>46182824 #>>46183276 #>>46183298 #

tpoacher ◴[07 Dec 25 16:02 UTC] No.46182670[source]▶

>>46182594 #

This is true, but here the equivalent situation is someone using a greek question mark (";") instead of a semicolon (";"), and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.

Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.

So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.

replies(3): >>46182739 #>>46182753 #>>46183029 #

1. grayhatter ◴[07 Dec 25 16:13 UTC] No.46182753{3}[source]▶

>>46182670 #

> This is true, but here the equivalent situation is someone using a greek question mark (";") instead of a semicolon (";"),

No it's not. I think you're trying to make a different point, because you're using an example of a specific deliberate malicious way to hide a token error that prevents compilation, but is visually similar.

> and you as a code reviewer are only expected to review the code visually and are not provided the resources required to compile the code on your local machine to see the compiler fail.

What weird world are you living in where you don't have CI. Also, it's pretty common I'll test code locally when reviewing something more complex, more complex, or more important, if I don't have CI.

> Yes in theory you can go through every semicolon to check if it's not actually a greek question mark; but one assumes good faith and baseline competence such that you as the reviewer would generally not be expected to perform such pedantic checks.

I don't, because it won't compile. Not because I assume good faith. References and citations are similar to introducing dependencies. We're talking about completely fabricated deps. e.g. This engineer went on npm and grabbed the first package that said left-pad but it's actually a crypto miner. We're not talking about a citation missing a page number, or publication year. We're talking about something that's completely incorrect, being represented as relevant.

> So if you think you might have reasonably missed greek question marks in a visual code review, then hopefully you can also appreciate how a paper reviewer might miss a false citation.

I would never miss this, because the important thing is code needs to compile. If it doesn't compile, it doesn't reach the master branch. Peer review of a paper doesn't have CI, I'm aware, but it's also not vulnerable to syntax errors like that. A paper with a fake semicolon isn't meaningfully different, so this analogy doesn't map to the fraud I'm commenting on.

replies(1): >>46182830 #

2. tpoacher ◴[07 Dec 25 16:20 UTC] No.46182830[source]▶

>>46182753 (TP) #

you have completely missed the point of the analogy.

breaking the analogy beyond the point where it is useful by introducing non-generalising specifics is not a useful argument. Otherwise I can counter your more specific non-generalising analogy by introducing little green aliens sabotaging your imaginary CI with the same ease and effect.

replies(1): >>46182940 #

3. grayhatter ◴[07 Dec 25 16:33 UTC] No.46182940[source]▶

>>46182830 #

I disagree you could do that and claim to be reasonable.

But I agree, because I'd rather discuss the pragmatics and not bicker over the semantics about an analogy.

Introducing a token error, is different from plagiarism, no? Someone wrote code that can't compile, is different from someone "stealing" proprietary code from some company, and contributing it to some FOSS repo?

In order to assume good faith, you also need to assume the author is the origin. But that's clearly not the case. The origin is from somewhere else, and the author that put their name on the paper didn't verify it, and didn't credit it.

replies(1): >>46184378 #

4. tpoacher ◴[07 Dec 25 19:34 UTC] No.46184378{3}[source]▶

>>46182940 #

Sure but the focus here is on the reviewer not the author.

The point is what is expected as reasonable review before one can "sign their name on it".

"Lazy" (or possibly malicious) authors will always have incentives to cut corners as long as no mechanisms exist to reject (or even penalise) the paper on submission automatically. Which would be the equivalent of a "compiler error" in the code analogy.

Effectively the point is, in the absence of such tools, the reviewer can only reasonably be expected to "look over the paper" for high-level issues; catching such low-level issues via manual checks by reviewers has massively diminishing returns for the extra effort involved.

So I don't think the conference shaming the reviewers here in the absence of providing such tooling is appropriate.

↑