Do AI detectors work? Students face false cheating accusations

My perspective after talking to a few colleagues in the CS education sector, and based on my own pre-GPT experience:

Classifiers sometimes produce false positives and false negatives. This is not news to anyone who has taken a ML module. We already required students back then to be able to interpret the results they were getting to some extent, as part of the class assignment.

Even before AI detectors, when Turnitin "classic" was the main tool along with JPlag and the like, if you were doing your job properly you would double-check any claims the tool produced before writing someone up for misconduct. AI detectors are no different.

That said, you already catch more students than you would think jut by going for the fruit hanging so low it's practically touching the ground already:

  - Writing or code that's identical for a large section (half a page at least) with material that already exists on the internet. This includes the classic copy-paste from wikipedia, sometimes with the square brackets for references still included. 
  - You still have to check that the student hasn't just made their _own_ git repo public by accident, but that's a rare edge case. But it shows that you always need a human brain in the loop before pushing results from automated tools to the misconduct panel.
  - Hundreds of lines of code that are structurally identical (up to tabs/spaces, variable naming, sometimes comments) with code that can already be found on the internet ("I have seen this code before" from the grader flags this up as least as often as the tools).
  - Writing that includes "I am an AI and cannot make this judgement" or similar.
  - Lots of hallucinated references.

That's more than enough to make the administration groan under the number of misconduct panels we convene every year.

The future in this corner of the world seems to be a mix of

  - invigilated exams with no electronic devices present
  - complementing full-term coding assignments with the occasional invigilated test in the school's coding lab
  - students required to do their work in a repo owned by the school's github org, and assessing the commit history (is everything in one big commit the night before the deadline?). This lets you grade for good working practices/time management, sensible use of branching etc. in team projects, as well as catching the more obvious cases of contract cheating.
  - viva voce exams on the larger assignments, which apart from catching people who have no idea of their own code or the language it was written in, allows you to grade their understanding ("Why did you use a linked list here?" type of questions) especially for the top students.