Do AI detectors work? Students face false cheating accusations

(www.bloomberg.com)

461 points JumpCrisscross | 2 comments | 20 Oct 24 17:26 UTC | HN request time: 0.534s | source

Show context

fuzzy_biscuit ◴[21 Oct 24 12:18 UTC] No.41903341[source]▶

If AI detection cannot be 100% accurate, I do not believe it is an appropriate solution for judging the futures of millions of students and young people. Time to move on. Either from the tech or from the essay format.

In either case, we need to change our standards around mastery of subject matter.

replies(4): >>41903517 #>>41903857 #>>41903861 #>>41904189 #

bdzr ◴[21 Oct 24 13:17 UTC] No.41903861[source]▶

>>41903341 #

What solutions are 100% accurate?

replies(2): >>41904025 #>>41904121 #

max51 ◴[21 Oct 24 13:47 UTC] No.41904121[source]▶

>>41903861 #

The problem is that AI detection is far closer to 0% than 100%,. It's really bad and the very nature of this tech makes it impossible to be good.

replies(1): >>41904234 #

1. bearjaws ◴[21 Oct 24 13:59 UTC] No.41904234[source]▶

>>41904121 #

As someone working in this field, it is simply not closer to 0%

People keep using these "gotcha" examples and never actually look at the stats for it. I get it, there are some terrible detectors out there, and of course they are the free ones :)

https://edintegrity.biomedcentral.com/articles/10.1007/s4097...

GPTZero was correct in most scenarios where they used basic prompts, and only had one false positive.

We did a comparison of hand reviewed 3,000 9-12th grade assignments and found that GPTZero holds up really well.

In the same way that plagiarism detectors need a process for review, your educational institution needs the same for AI detection. Students shouldn't be immediately punished, but instead it should be reviewed, and then an appropriate decision made by a person.

replies(1): >>41904911 #

2. Ukv ◴[21 Oct 24 14:58 UTC] No.41904911[source]▶

>>41904234 (TP) #

> https://edintegrity.biomedcentral.com/articles/10.1007/s4097...

> GPTZero was correct in most scenarios where they used basic prompts, and only had one false positive.

One false positive out of only "five human-written samples", unless I'm misreading.

Say 50 papers are checked, with 5 being generated by AI. By the rates of GPTZero in the paper, 3 AI-generated papers would be correctly flagged and 9 human-written papers would incorrectly flagged. Meaning a flagged paper is only 25% likely to actually be AI-generated.

Realistically the sample size in the paper is just far too small to make any real conclusion one way or another, but I think people fail to appreciate the difference between false positive rate and false discovery rate.

↑