Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?
Did they run the checker across a body of papers before LLMs were available and verify that there were no citations in peer reviewed papers that got authors or titles wrong?
Exactly as you said, do precisely this to pre-LLM works. There will be an enormous number of errors with utter certainty.
People keep imperfect notes. People are lazy. People sometimes even fabricate. None of this needed LLMs to happen.
Thad said, i am also very curious of the result than their tool, would give to papers from the 2010's and before.
> You also don't need gunpowder to kill someone with projectiles, but gunpowder changed things in important ways. All I ever see are the most specious knee-jerk defenses of AI that immediately fall apart.
Humans can do all of the above but it costs them more, and they do it more slowly. LLMs generate spam at a much faster rate.
But no one is claiming these papers were hallucinated whole, so I don't see how that's relevant. This study -- notably to sell an "AI detector", which is largely a laughable snake-oil field -- looked purely at the accuracy of citations[1] among a very large set of citations. Errors in papers are not remotely uncommon, and finding some errors is...exactly what one would expect. As the GP said, do the same study on pre-LLM papers and you'll find an enormous number of incorrect if not fabricated citations. Peer review has always been an illusion of auditing.
1 - Which is such a weird thing to sell an "AI detection" tool. Clearly it was mostly manual given that they somehow only managed to check a tiny subset of the papers, so in all likelihood was some guy going through citations and checking them on Google Search.
When I was in grad school, I kept a fairly large .bib file that almost certainly had a mistake or two in it. I don’t think any of them ever made it to print, but it’s hard to be 100% sure.
For most journals, they actually partially check your citations as part of the final editing. The citation record is important for journals, and linking with DOIs is fairly common.
A pre LLM paper with fabricated citations would demonstrate will to cheat by the author.
A post LLM paper with fabricated citations: same thing and if the authors attempt to defend themselves with something like, we trusted the AI, they are sloppy, probably cheaters and not very good at it.
Interesting that you hallucinated the word "fabricated" here where I broadly talked about errors. Humans, right? Can't trust them.
Firstly, just about every paper ever written in the history of papers has errors in it. Some small, some big. Most accidental, but some intentional. Sometimes people are sloppy keeping notes, transcribe a row, get a name wrong, do an offset by 1. Sometimes they just entirely make up data or findings. This is not remotely new. It has happened as long as we've had papers. Find an old, pre-LLM paper and go through the citations -- especially for a tosser target like this where there are tens of thousands of low effort papers submitted -- and you're going to find a lot of sloppy citations that are hard to rationalize.
Secondly, the "hallucination" is that this particular snake-oil firm couldn't find given papers in many cases (they aren't foolish enough to think that means they were fabricated. But again, they're looking to sell a tool to rubes, so the conclusion is good enough), and in others that some of the author names are wrong. Eh.
The references were made up, and this is easier and faster to do with LLMs than with humans. Easier to do inadvertently, too.
As I said, LLMs are a force multiplier for fraud and inadvertent errors. So it's a big deal.
That also makes some of those errors easier. A bad auto-import of paper metadata can silently screw up some of the publication details, and replacing an early preprint with the peer-reviewed article of record takes annoying manual intervention.
You'd think so, but apparently it isn't for these folks. On the other hand, saying "we've found 50 hallucinations in scientific papers" generates a lot more clicks than "we've found 50 common citation mistakes that people make all the time"
not just some hallucinated citations, and not just the writing. in many cases the actual purported research "ideas" seem to be plausible nonsense.
To get a feel for it, you can take some of the topics they write about and ask your favorite LLM to generate a paper. Maybe even throw "Deep Research" mode at it. Perhaps tell it to put it in ICLR latex format. It will look a lot like these.
I did some checking and found the report does exist, but the citation is still not quite correct. Then I discovered someone is running some LLM based citation checker already, which already fact checked this claim and did a correct writeup that seems a lot better than what this GPTZero tool does.
https://checkplease.neocities.org/maha/html/17-loneliness-73...
The mistakes in the citation are the sort of mistake that could have been made by both a human or an AI, really. The visualization in the report is confusing and does contain the 73% number (rounded up), but it's unclear how to interpret the numbers because it's some sort of "vitality index" and not what you'd expect based on how it's introduced. At first glance I actually mis-interpreted it the same way the report does, so it's hard to view this is as clear evidence of AI misuse. Yet the GPTZero folks do make very strong claims based on nothing more than a URL scraper script.