In either case, we need to change our standards around mastery of subject matter.
In either case, we need to change our standards around mastery of subject matter.
People keep using these "gotcha" examples and never actually look at the stats for it. I get it, there are some terrible detectors out there, and of course they are the free ones :)
https://edintegrity.biomedcentral.com/articles/10.1007/s4097...
GPTZero was correct in most scenarios where they used basic prompts, and only had one false positive.
We did a comparison of hand reviewed 3,000 9-12th grade assignments and found that GPTZero holds up really well.
In the same way that plagiarism detectors need a process for review, your educational institution needs the same for AI detection. Students shouldn't be immediately punished, but instead it should be reviewed, and then an appropriate decision made by a person.
> GPTZero was correct in most scenarios where they used basic prompts, and only had one false positive.
One false positive out of only "five human-written samples", unless I'm misreading.
Say 50 papers are checked, with 5 being generated by AI. By the rates of GPTZero in the paper, 3 AI-generated papers would be correctly flagged and 9 human-written papers would incorrectly flagged. Meaning a flagged paper is only 25% likely to actually be AI-generated.
Realistically the sample size in the paper is just far too small to make any real conclusion one way or another, but I think people fail to appreciate the difference between false positive rate and false discovery rate.