Most active commenters

auggierose(4)
mrweasel(3)
ckastner(3)
smartmic(3)
freilanzer(3)

Popular/hot comments

>>41902108 #

←back to thread

Do AI detectors work? Students face false cheating accusations

(www.bloomberg.com)

1. mrweasel ◴[21 Oct 24 08:22 UTC] No.41901883[source]▶

>>41896973 (OP) #

The part that annoys me is that students apparently have no right to be told why the AI flagged their work. For any process where an computer is allowed to judge people, where should be a rule in place that demands that the algorithm be able explains EXACTLY why it flagged this person.

Now this would effectively kill off the current AI powered solution, because they have no way of explaining, or even understanding, why a paper may be plagiarized or not, but I'm okay with that.

replies(9): >>41902108 #>>41902131 #>>41902463 #>>41902522 #>>41902919 #>>41905044 #>>41905842 #>>41907688 #>>41913643 #

2. ben_w ◴[21 Oct 24 09:08 UTC] No.41902108[source]▶

>>41901883 (TP) #

> For any process where an computer is allowed to judge people, where should be a rule in place that demands that the algorithm be able explains EXACTLY why it flagged this person.

This is a big part of GDPR.

replies(3): >>41902128 #>>41902309 #>>41903067 #

3. ckastner ◴[21 Oct 24 09:11 UTC] No.41902128[source]▶

>>41902108 #

Indeed. Quoting article 22 [1]:

> The data subject shall have the right not to be subject to a decision based solely on automated processing [...]

[1]: https://gdpr.eu/article-22-automated-individual-decision-mak...

replies(1): >>41906475 #

4. sersi ◴[21 Oct 24 09:11 UTC] No.41902131[source]▶

>>41901883 (TP) #

It's a similar problem to people being banned from Google (insert big company name) because of an automated fraud detection system that doesn't give any reason behind the ban.

I also thing that there should be laws requiring a clear explanation whenever that happens.

replies(2): >>41902219 #>>41902596 #

5. razakel ◴[21 Oct 24 09:26 UTC] No.41902219[source]▶

>>41902131 #

What about tipping off? Banks can't tell you that they've closed your account because of fraud or money laundering.

replies(2): >>41902939 #>>41904613 #

6. mrweasel ◴[21 Oct 24 09:38 UTC] No.41902309[source]▶

>>41902108 #

I did not know that. Thank you.

Reading the rules quickly, it does seem like you're not entitled to know why the computer flagged you, only that you have the right to "obtain human intervention". That seems a little to soft, I'd like to know under which rules exactly I'm being judged.

7. viraptor ◴[21 Oct 24 10:00 UTC] No.41902463[source]▶

>>41901883 (TP) #

> kill off the current AI powered solution, because they have no way of explaining

That's not correct. Some solution look at perplexity for specific models, some will look at ngram frequencies, and similar approaches. Almost all of those can produce a heatmap of "what looks suspicious". I wouldn't expect any of the detection systems to be like black boxes relying on LLM over the whole text.

replies(2): >>41902900 #>>41904060 #

8. smartmic ◴[21 Oct 24 10:11 UTC] No.41902522[source]▶

>>41901883 (TP) #

I agree with you, but I would go further and turn the tables. An AI should simply not be allowed to evaluate people, in any context whatsoever. For the simple reason that it has been proven not to work (and will also never).

Anyone interested to learn more about it, I recommend the recent book "AI Snake Oil" from Arvind Narayanan and Sayash Kapoor [1]. It is a critical but nuanced book and helps to see the whole AI hype a little more clearly.

[1] https://press.princeton.edu/books/hardcover/9780691249131/ai....

replies(2): >>41902634 #>>41903001 #

9. tuetuopay ◴[21 Oct 24 10:22 UTC] No.41902596[source]▶

>>41902131 #

while it is infuriating, it's common for every place where fraud is an issue. if the company gave feedback, it would open the door to probing and know what is being watched or not. same reason as why a bank will not tell you why you got kicked off.

10. fullstackchris ◴[21 Oct 24 10:30 UTC] No.41902634[source]▶

>>41902522 #

I'm definitely no AI hypster, but saying anything will "never" work over an infinite timeline is a big statement... do you have grounds why some sort of AI system could one day "never" work at evaluating some metric about someone? Seems we have reliable systems already doing that in some areas (facial recognition at airport boarding, for example)

replies(2): >>41902909 #>>41908929 #

11. mrweasel ◴[21 Oct 24 11:13 UTC] No.41902900[source]▶

>>41902463 #

Sorry if this is "moving the goal post", but I wouldn't call looking at ngram frequencies for AI. Producing a heatmap doesn't tell you why something is suspicious, but it's obviously better than telling you nothing.

In any case, if you where to use LLMs, or other black box solutions, you'd have to yank those out, if you where met with a requirement to explain why something is suspicious.

replies(1): >>41907817 #

12. smartmic ◴[21 Oct 24 11:15 UTC] No.41902909{3}[source]▶

>>41902634 #

Okay, let me try to be more precise. By "evaluate", I mean using an AI to make predictions about human behavior, either retrospectively (as is the case here in trying to make an accusation of cheating) or prospectively (i.e. automating criminal justice). Even if you could collect all the parameters (features?) that make up a human being, there is the randomness in humans and in nature in general, which simply destroys any ultimate prediction machine. Not to mention the edge cases we wander into. You can try to measure and average a human being, and you will get a certain accuracy well above 50%, but you will never cross the threshold of such high accuracy that a human being should be measured against, especially in life-deciding questions like career decisions or any social matters.

Reliable systems in some areas? - Absolutely, and yes, even facial recognition. I agree, it works very well, but that is a different issue as it does not reveal or try to guess anything about the inner person. There are other problems that arise from the fact that it works so well (surveillance, etc.), but I did not mean that part of the equation.

replies(1): >>41903126 #

13. iLoveOncall ◴[21 Oct 24 11:17 UTC] No.41902919[source]▶

>>41901883 (TP) #

Surely you understand how any algorithm (regardless of its nature) that gives the cheater the list of reasons why it spotted cheating will only work for a single iteration before the cheaters adapt, right?

replies(2): >>41904104 #>>41904191 #

14. tonypace ◴[21 Oct 24 11:21 UTC] No.41902939{3}[source]▶

>>41902219 #

They should have to tell you that. I can see why it's convenient for them not to, but I believe the larger point is far more important.

15. raincole ◴[21 Oct 24 11:32 UTC] No.41903001[source]▶

>>41902522 #

Statistical models (which "AI" is) have been used to evaluate people's outputs since forever.

Examples: Spam detection, copyrighted material detection, etc.

replies(1): >>41903876 #

16. 2rsf ◴[21 Oct 24 11:43 UTC] No.41903067[source]▶

>>41902108 #

And not less importantly the still young EU AI Act

17. _heimdall ◴[21 Oct 24 11:50 UTC] No.41903126{4}[source]▶

>>41902909 #

This feels like an argument bigger than AI evaluations. All points you raised could very well be issues with humans evaluating other humans to attempt to predict future outcomes.

replies(1): >>41904190 #

18. freilanzer ◴[21 Oct 24 13:19 UTC] No.41903876{3}[source]▶

>>41903001 #

But not in cheating or grades, etc. Spam filters are completely different from this.

replies(2): >>41904125 #>>41905424 #

19. ◴[21 Oct 24 13:39 UTC] No.41904060[source]▶

>>41902463 #

20. lcnPylGDnU4H9OF ◴[21 Oct 24 13:44 UTC] No.41904104[source]▶

>>41902919 #

I don’t think there’s anything to indicate they don’t understand this idea. But this misses the point; in their eyes, the lesser evil is to allow those with false positives to call the reasoning into question.

21. baby_souffle ◴[21 Oct 24 13:48 UTC] No.41904125{4}[source]▶

>>41903876 #

> But not in cheating or grades, etc. Spam filters are completely different from this.

Really? A spammer is trying to ace a test where my attention is the prize. I don't really see a huge difference between a student/diploma and a spammer/my attention.

Education tech companies have been playing with ML and similar tech that is "AI adjacent" for decades. If you went to school in the US any time after computers entered the class room, you probably had some exposure to a machine generated/scored test. That data was used to tailor lessons to pupil interest/goals/state curricula. Good software also gave instructor feedback about where each student/cohort is struggling or not.

LLMs are just an evolution of tech that's been pretty well integrated into academic life for a while now. Was anything in academia prepared for this evolution? No. But banning it outright isn't going to work

replies(1): >>41914843 #

22. smartmic ◴[21 Oct 24 13:55 UTC] No.41904190{5}[source]▶

>>41903126 #

They are not wrong. And the art of predicting future outcomes proves to be difficult and fraught with failure. But human evaluation of other humans is more like an open level field to me. A human is accountable for what he or she says or predicts about others, subject to interrogation or social or legal consequences. Not so easy with AI, because it steps out of all these areas - at least many actors using AI do not seem to stay responsible and take on all these mistakes.

replies(1): >>41904505 #

23. baby_souffle ◴[21 Oct 24 13:55 UTC] No.41904191[source]▶

>>41902919 #

> Surely you understand how any algorithm (regardless of its nature) that gives the cheater the list of reasons why it spotted cheating will only work for a single iteration before the cheaters adapt, right?

This happens anyways, though? Any service that's useful for alternative / shady / illicit purposes is part of a cat/mouse game. Even if you don't tell the $badActors what you're looking for, they'll learn soon enough what you're not looking for just by virtue of their exploitative behavior still working.

I'm a little skeptical of any "we fight bad guys!" effort that can be completely tanked by telling the bad guys how they got caught.

24. _heimdall ◴[21 Oct 24 14:23 UTC] No.41904505{6}[source]▶

>>41904190 #

In my experience, we're really bad at holding humans accountable for their predictions too. That may even be a good thing, but I'm less confident that we would be holding LLMs less accountable for their predictions than humans.

25. acdha ◴[21 Oct 24 14:34 UTC] No.41904613{3}[source]▶

>>41902219 #

That doesn’t seem like a good comparison: it’s a far more serious crime, and while the bank won’t tell that they’re reporting your activity to the authorities the legal process absolutely will and in sensible countries you’re required to be given the opportunity to challenge the evidence.

The problem being discussed here feels like it should be similar in that last regard: any time an automated system is making a serious decision they should be required to have an explanation and review process. If they don’t have sufficient evidence to back up the claim, they need to collect that evidence before making further accusations.

26. 4star3star ◴[21 Oct 24 15:10 UTC] No.41905044[source]▶

>>41901883 (TP) #

Totally agree. "Your paper is flagged for plagiarism. You get a zero." "But I swear I wrote that 100% on my own. What does it say I plagiarized?" "It doesn't say, but you still get a zero."

In what world is this fair? Our court systems certainly don't operate under these assumptions.

27. gs17 ◴[21 Oct 24 15:50 UTC] No.41905424{4}[source]▶

>>41903876 #

> But not in cheating or grades

I had both, over a decade ago in high school. Plagiarism detection is the original AI detection, although they usually told you specifically what you were accused of stealing from. A computer-based English course I took over the summer used automated grading to decide if what you wrote was good enough (IIRC they did have a human look over it at some point).

replies(1): >>41934368 #

28. kjkjadksj ◴[21 Oct 24 16:35 UTC] No.41905842[source]▶

>>41901883 (TP) #

Thats how these tools mostly already work at least on the instructor side. They flag the problem text and will say where it came from. Its up to the teacher to do this due diligence and see if its a quote that merely got flagged or actual plagiarism.

29. auggierose ◴[21 Oct 24 17:45 UTC] No.41906475{3}[source]▶

>>41902128 #

So if an automated decision happens, and the reviewer looks for a second at it, and says, good enough, that will be OK according to GDPR. Don't see what GDPR solves here.

replies(2): >>41907059 #>>41908555 #

30. lucianbr ◴[21 Oct 24 18:41 UTC] No.41907059{4}[source]▶

>>41906475 #

Well I guess the theory is that you could go to court, and the court would be reasonable and say "this 1 second look does not fulfill the requirement, you need to actually use human judgement and see what was going on there". Lots of discussions regarding FAANG malicious compliance have shown this is how high courts work in EU. When there is political will.

But if you're a nobody, and can't afford to go to court against Deutsche Bank for example, of course you're SOL. EU has some good parts, but it's still a human government.

It's especially problematic since a good chunk of those "flagged" are actually doing something nefarious, and both courts and government will consider that "mostly works" is a good outcome. One or ten unlucky citizens are just the way the world works, as long as it's not someone with money or power or fame.

replies(1): >>41907458 #

31. auggierose ◴[21 Oct 24 19:21 UTC] No.41907458{5}[source]▶

>>41907059 #

I don't see that even people with money and power can do anything here. It is like VAR. When has it ever happened that the referee goes to the screen, and does not follow the VAR recommendation? Never. That is how automated decision making will work as well, across the board.

32. floatrock ◴[21 Oct 24 19:38 UTC] No.41907688[source]▶

>>41901883 (TP) #

Must be so demoralizing to be a kid these days. You use AI --> you're told you're cheating, which is immoral. You don't use AI --> you eventually get accused of using it or you get left behind by those who do use it.

Figuring out who the hell you are in your high school years was hard enough when Kafka was only a reading assignment.

33. viraptor ◴[21 Oct 24 19:49 UTC] No.41907817{3}[source]▶

>>41902900 #

It's literally the explanation. The only identification we have now is "this local part is often used by an AI model" and "this global structure is often used by an AI model". There's nothing more fancy about it. The heatmap would literally just point out "this part is suspiciously unlikely" - that's the explanation because that's the classification systems use.

34. ckastner ◴[21 Oct 24 21:10 UTC] No.41908555{4}[source]▶

>>41906475 #

> So if an automated decision happens, and the reviewer looks for a second at it, and says, good enough, that will be OK according to GDPR. Don't see what GDPR solves here.

The assumption is that a human review the conditions that led the automated system to make that decision.

I think it would be trivial to argue in court that rubberstamping some scalar value that a deep neural net or whatever spit out does not pass that bar. It's still the automated system's decision, the human is just parroting it.

Note that it's easier for the FAANGs to argue such a review has happened because they have massive amounts of heterogenous data where there's bound to be something that would be sufficient to argue with (like having posted something that offended someone).

But a single score? I'd say almost impossible to argue. One would have to demonstrate that the system is near-perfect, and virtually never makes mistakes.

replies(1): >>41911853 #

35. PeterisP ◴[21 Oct 24 21:55 UTC] No.41908929{3}[source]▶

>>41902634 #

There's the dichotomy of an irresistible force meeting an immovable object - only one of these is possible.

Either there can be an undefeatable AI detector, or an undetectable AI writer, both can't exist in the same universe. And my assumption is that with sufficient advances there could be a fully human-equivalent AI that is not distinguishable from a human in any way, so in that sense being able to detect it will actually never work.

36. auggierose ◴[22 Oct 24 06:53 UTC] No.41911853{5}[source]▶

>>41908555 #

Why a single score? The AI can generate a whole boilerplate of argumentation of why it made the decision. The reviewer will pretend to read and contemplate it, and then press the "OK" button.

replies(2): >>41912318 #>>41914749 #

37. ben_w ◴[22 Oct 24 08:29 UTC] No.41912318{6}[source]▶

>>41911853 #

There's a lot of "what can you get away with" in the world; but getting caught cheating like that is likely to go worse than getting caught using an LLM for a final exam.

replies(1): >>41912956 #

38. auggierose ◴[22 Oct 24 10:34 UTC] No.41912956{7}[source]▶

>>41912318 #

My argument is there is no "getting caught" here. It is perfectly legal.

39. GJim ◴[22 Oct 24 12:41 UTC] No.41913643[source]▶

>>41901883 (TP) #

> For any process where an computer is allowed to judge people....

GDPR to the rescue!

https://ico.org.uk/for-organisations/uk-gdpr-guidance-and-re...

You must identify whether any of your processing falls under Article 22 [automated decision making, including AI] and, if so, make sure that you:

* give individuals information about the processing;

* introduce simple ways for them to request human intervention or challenge a decision;

* carry out regular checks to make sure that your systems are working as intended.

Why in gods name has the USA not adopted similar common sense legislation?

40. ckastner ◴[22 Oct 24 14:37 UTC] No.41914749{6}[source]▶

>>41911853 #

Fair point.

41. freilanzer ◴[22 Oct 24 14:46 UTC] No.41914843{5}[source]▶

>>41904125 #

> I don't really see a huge difference between a student/diploma and a spammer/my attention.

You don't see a difference between potentially ruining a students future due to grading done by an opaque ai system and you clicking on a spam email? That's preposterous.

42. freilanzer ◴[24 Oct 24 11:14 UTC] No.41934368{5}[source]▶

>>41905424 #

If it can be checked easily, that's something else entirely. But as soon as the grading is a black box, it's not acceptable, in my opinion.

↑