Over fifty new hallucinations in ICLR 2026 submissions

(gptzero.me)

504 points puttycat | 3 comments | 07 Dec 25 13:16 UTC | HN request time: 0.001s | source

Show context

theoldgreybeard ◴[07 Dec 25 15:09 UTC] No.46182214[source]▶

If a carpenter builds a crappy shelf “because” his power tools are not calibrated correctly - that’s a crappy carpenter, not a crappy tool.

If a scientist uses an LLM to write a paper with fabricated citations - that’s a crappy scientist.

AI is not the problem, laziness and negligence is. There needs to be serious social consequences to this kind of thing, otherwise we are tacitly endorsing it.

replies(37): >>46182289 #>>46182330 #>>46182334 #>>46182385 #>>46182388 #>>46182401 #>>46182463 #>>46182527 #>>46182613 #>>46182714 #>>46182766 #>>46182839 #>>46182944 #>>46183118 #>>46183119 #>>46183265 #>>46183341 #>>46183343 #>>46183387 #>>46183435 #>>46183436 #>>46183490 #>>46183571 #>>46183613 #>>46183846 #>>46183911 #>>46183917 #>>46183923 #>>46183940 #>>46184450 #>>46184551 #>>46184653 #>>46184796 #>>46185025 #>>46185817 #>>46185849 #>>46189343 #

CapitalistCartr ◴[07 Dec 25 15:29 UTC] No.46182385[source]▶

>>46182214 #

I'm an industrial electrician. A lot of poor electrical work is visible only to a fellow electrician, and sometimes only another industrial electrician. Bad technical work requires technical inspectors to criticize. Sometimes highly skilled ones.

replies(5): >>46182431 #>>46182828 #>>46183216 #>>46184370 #>>46184518 #

andy99 ◴[07 Dec 25 15:35 UTC] No.46182431[source]▶

>>46182385 #

I’ve reviewed a lot of papers, I don’t consider it the reviewers responsibility to manually verify all citations are real. If there was an unusual citation that was relied on heavily for the basis of the work, one would expect it to be checked. Things like broad prior work, you’d just assume it’s part of background.

The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document. They are also assuming good faith.

replies(14): >>46182472 #>>46182485 #>>46182508 #>>46182513 #>>46182594 #>>46182744 #>>46182769 #>>46183010 #>>46183317 #>>46183396 #>>46183881 #>>46183895 #>>46184147 #>>46186438 #

grayhatter ◴[07 Dec 25 15:53 UTC] No.46182594[source]▶

>>46182431 #

> The reviewer is not a proofreader, they are checking the rigour and relevance of the work, which does not rest heavily on all of the references in a document.

I've always assumed peer review is similar to diff review. Where I'm willing to sign my name onto the work of others. If I approve a diff/pr and it takes down prod. It's just as much my fault, no?

> They are also assuming good faith.

I can only relate this to code review, but assuming good faith means you assume they didn't try to introduce a bug by adding this dependency. But I would should still check to make sure this new dep isn't some typosquatted package. That's the rigor I'm responsible for.

replies(6): >>46182658 #>>46182670 #>>46182685 #>>46182824 #>>46183276 #>>46183298 #

pron ◴[07 Dec 25 16:20 UTC] No.46182824[source]▶

>>46182594 #

That is not, cannot be, and shouldn't be, the bar for peer review. There are two major differences between it and code review:

1. A patch is self-contained and applies to a codebase you have just as much access to as the author. A paper, on the other hand, is just the tip of the iceberg of research work, especially if there is some experiment or data collection involved. The reviewer does not have access to, say, videos of how the data was collected (and even if they did, they don't have the time to review all of that material).

2. The software is also self-contained. That's "prodcution". But a scientific paper does not necessarily aim to represent scientific consensus, but a finding by a particular team of researchers. If a paper's conclusions are wrong, it's expected that it will be refuted by another paper.

replies(1): >>46183053 #

grayhatter ◴[07 Dec 25 16:49 UTC] No.46183053{3}[source]▶

>>46182824 #

> That is not, cannot be, and shouldn't be, the bar for peer review.

Given the repeatability crisis I keep reading about, maybe something should change?

> 2. The software is also self-contained. That's "prodcution". But a scientific paper does not necessarily aim to represent scientific consensus, but a finding by a particular team of researchers. If a paper's conclusions are wrong, it's expected that it will be refuted by another paper.

This is a much, MUCH stronger point. I would have lead with this because the contrast between this assertion, and my comparison to prod is night and day. The rules for prod are different from the rules of scientific consensus. I regret losing sight of that.

replies(2): >>46183433 #>>46183589 #

1. hnfong ◴[07 Dec 25 17:35 UTC] No.46183433{4}[source]▶

>>46183053 #

IMHO what should change is we stop putting "peer reviewed" articles on a pedestal.

Even if peer review is as rigorous as code reviewed (the former which is usually unpaid), we all know that reviewed code still has bugs, and a programmer would be nuts to go around saying "this code is reviewed by experts, we can assume it's bug free, right?"

But there are too many people who are just assuming peer reviewed articles means they're somehow automatically correct.

replies(1): >>46184649 #

2. vkou ◴[07 Dec 25 20:06 UTC] No.46184649[source]▶

>>46183433 (TP) #

> IMHO what should change is we stop putting "peer reviewed" articles on a pedestal.

Correct. Peer review is a minimal and necessary but not sufficient step.

replies(1): >>46206358 #

3. hnfong ◴[09 Dec 25 15:56 UTC] No.46206358[source]▶

>>46184649 #

I agree in principle, and I think this is what's happening mostly. But IMHO the public perception of a paper being peer reviewed as somehow "more trustworthy" is also kind of... bad.

I mean, being peer reviewed is a signal of a paper's quality, but in the hands of an expert in that domain it's not a very valuable signal, because they can just read the paper themselves, and figure out whether it's legit. So instead of having "experts" try to explain a paper and commenting on whether it's peer reviewed or not, I think the better practice is to have said expert say "I read the paper and it's legit", or "I read the paper and it's nonsense".

IMHO the reason they make note of whether it's peer reviewed is because they don't know enough to make the judgement themselves. And the fallback is to trust a couple anonymous reviewers attest to the quality of a paper! If you think of it that way, using this signal to vet the quality of a publication to the lay public isn't really a good idea.

↑