←back to thread

395 points pseudolus | 1 comments | | HN request time: 0.21s | source
Show context
dtnewman ◴[] No.43633873[source]
> A common question is: “how much are students using AI to cheat?” That’s hard to answer, especially as we don’t know the specific educational context where each of Claude’s responses is being used.

I built a popular product that helps teachers with this problem.

Yes, it's "hard to answer", but let's be honest... it's a very very widespread problem. I've talked to hundreds of teachers about this and it's a ubiquitous issue. For many students, it's literally "let me paste the assignment into ChatGPT and see what it spits out, change a few words and submit that".

I think the issue is that it's so tempting to lean on AI. I remember long nights struggling to implement complex data structures in CS classes. I'd work on something for an hour before I'd have an epiphany and figure out what was wrong. But that struggling was ultimately necessary to really learn the concepts. With AI, I can simply copy/paste my code and say "hey, what's wrong with this code?" and it'll often spot it (nevermind the fact that I can just ask ChatGPT "create a b-tree in C" and it'll do it). That's amazing in a sense, but also hurts the learning process.

replies(34): >>43633957 #>>43634006 #>>43634053 #>>43634075 #>>43634251 #>>43634294 #>>43634327 #>>43634339 #>>43634343 #>>43634407 #>>43634559 #>>43634566 #>>43634616 #>>43634842 #>>43635388 #>>43635498 #>>43635830 #>>43636831 #>>43638149 #>>43638980 #>>43639096 #>>43639628 #>>43639904 #>>43640528 #>>43640853 #>>43642243 #>>43642367 #>>43643255 #>>43645561 #>>43645638 #>>43646665 #>>43646725 #>>43647078 #>>43654777 #
enjo ◴[] No.43640528[source]
> it's literally "let me paste the assignment into ChatGPT and see what it spits out, change a few words and submit that".

My wife is an accounting professor. For many years her battle was with students using Chegg and the like. They would submit roughly correct answers but because she would rotate the underlying numbers they would always be wrong in a provably cheating way. This made up 5-8% of her students.

Now she receives a parade of absolutely insane answers to questions from a much larger proportion of her students (she is working on some research around this but it's definitely more than 30%). When she asks students to recreate how they got to these pretty wild answers they never have any ability to articulate what happened. They are simply throwing her questions at LLMs and submitting the output. It's not great.

replies(6): >>43640669 #>>43640941 #>>43641433 #>>43642050 #>>43642506 #>>43643150 #
Zanfa ◴[] No.43642506[source]
ChatGPT is laughably terrible at double entry accounting. A few weeks ago I was trying to use it to figure out a reasonable way to structure accounts for a project given the different business requirements I had. It kept disappearing money when giving examples. Pointing it out didn’t help either, it just apologized and went on to make the same mistake in a different way.
replies(4): >>43642728 #>>43645101 #>>43646564 #>>43651487 #
andai ◴[] No.43642728[source]
Using a system based on randomness for a process that must occur deterministically is probably the wrong solution.

I'm running into similar issues trying to use LLMs for logic and reasoning.

They can do it (surprisingly well, once you disable the friendliness that prevents it), but you get a different random subset of correct answers every time.

I don't know if setting temperature to 0 would help. You'd get the same output every time, but it would be the same incomplete / wrong output.

Probably a better solution is a multi phase thing, where you generate a bunch of outputs and then collect and filter them.

replies(2): >>43642847 #>>43655717 #
jjmaestro ◴[] No.43642847[source]
> They can do it (surprisingly well, once you disable the friendliness that prevents it) ...

Interesting! :D Do you mind sharing the prompt(s) that you use to do that?

Thanks!!

replies(2): >>43645849 #>>43649100 #
idonotknowwhy ◴[] No.43649100[source]
Have you tried Deepseek-R1?

I run it locally and read the raw thought process, find it very useful (can be ruthless at times) seeing this before it tags on the friendliness.

Then you can see it's planning process to tag on the warmth/friendliness "but the user seems proud of... so I need to acknowledge..."

I don't think Gemini's "thoughts" are the raw CoT process, they're summarized / cleaned up by a small model before returned to you (same as OpenAI models).

replies(1): >>43652741 #
1. andai ◴[] No.43652741[source]
That's fascinating. I've been trying to get other models to mimick Gemini 2.5 Pro's thought process, but even with examples, they don't do it very well. Which surprised me, because I think even the original (no RLHF) GPT-3 was pretty good at following formats like that! But maybe there's not enough training data in that format for it to "click".

It does seem similar in structure to Gemini 2.0's output format with the nested bullets though, so I have to assume they trained on synthetic examples.