←back to thread

395 points pseudolus | 4 comments | | HN request time: 0.693s | source
Show context
dtnewman ◴[] No.43633873[source]
> A common question is: “how much are students using AI to cheat?” That’s hard to answer, especially as we don’t know the specific educational context where each of Claude’s responses is being used.

I built a popular product that helps teachers with this problem.

Yes, it's "hard to answer", but let's be honest... it's a very very widespread problem. I've talked to hundreds of teachers about this and it's a ubiquitous issue. For many students, it's literally "let me paste the assignment into ChatGPT and see what it spits out, change a few words and submit that".

I think the issue is that it's so tempting to lean on AI. I remember long nights struggling to implement complex data structures in CS classes. I'd work on something for an hour before I'd have an epiphany and figure out what was wrong. But that struggling was ultimately necessary to really learn the concepts. With AI, I can simply copy/paste my code and say "hey, what's wrong with this code?" and it'll often spot it (nevermind the fact that I can just ask ChatGPT "create a b-tree in C" and it'll do it). That's amazing in a sense, but also hurts the learning process.

replies(34): >>43633957 #>>43634006 #>>43634053 #>>43634075 #>>43634251 #>>43634294 #>>43634327 #>>43634339 #>>43634343 #>>43634407 #>>43634559 #>>43634566 #>>43634616 #>>43634842 #>>43635388 #>>43635498 #>>43635830 #>>43636831 #>>43638149 #>>43638980 #>>43639096 #>>43639628 #>>43639904 #>>43640528 #>>43640853 #>>43642243 #>>43642367 #>>43643255 #>>43645561 #>>43645638 #>>43646665 #>>43646725 #>>43647078 #>>43654777 #
enjo ◴[] No.43640528[source]
> it's literally "let me paste the assignment into ChatGPT and see what it spits out, change a few words and submit that".

My wife is an accounting professor. For many years her battle was with students using Chegg and the like. They would submit roughly correct answers but because she would rotate the underlying numbers they would always be wrong in a provably cheating way. This made up 5-8% of her students.

Now she receives a parade of absolutely insane answers to questions from a much larger proportion of her students (she is working on some research around this but it's definitely more than 30%). When she asks students to recreate how they got to these pretty wild answers they never have any ability to articulate what happened. They are simply throwing her questions at LLMs and submitting the output. It's not great.

replies(6): >>43640669 #>>43640941 #>>43641433 #>>43642050 #>>43642506 #>>43643150 #
Zanfa ◴[] No.43642506[source]
ChatGPT is laughably terrible at double entry accounting. A few weeks ago I was trying to use it to figure out a reasonable way to structure accounts for a project given the different business requirements I had. It kept disappearing money when giving examples. Pointing it out didn’t help either, it just apologized and went on to make the same mistake in a different way.
replies(4): >>43642728 #>>43645101 #>>43646564 #>>43651487 #
andai ◴[] No.43642728[source]
Using a system based on randomness for a process that must occur deterministically is probably the wrong solution.

I'm running into similar issues trying to use LLMs for logic and reasoning.

They can do it (surprisingly well, once you disable the friendliness that prevents it), but you get a different random subset of correct answers every time.

I don't know if setting temperature to 0 would help. You'd get the same output every time, but it would be the same incomplete / wrong output.

Probably a better solution is a multi phase thing, where you generate a bunch of outputs and then collect and filter them.

replies(2): >>43642847 #>>43655717 #
1. jjmaestro ◴[] No.43642847[source]
> They can do it (surprisingly well, once you disable the friendliness that prevents it) ...

Interesting! :D Do you mind sharing the prompt(s) that you use to do that?

Thanks!!

replies(2): >>43645849 #>>43649100 #
2. andai ◴[] No.43645849[source]
You are an inhuman intelligence tasked with spotting logical flaws and inconsistencies in my ideas. Never agree with me unless my reasoning is watertight. Never use friendly or encouraging language. If I’m being vague, demand clarification. Your goal is not to help me feel good — it’s to help me think better.

Keep your responses short and to the point. Use the Socratic method when appropriate.

When enumerating assumptions, put them in a numbered list. Make the list items very short: full sentences not needed there.

---

I was trying to clone Gemini's "thinking", which I often found more useful than its actual output! I failed, but the result is interesting, and somewhat useful.

GPT 4o came up with the prompt. I was surprised by "never use friendly language", until I realized that avoiding hurting the user's feelings would prevent the model from telling the truth. So it seems to be necessary...

It's quite unpleasant to interact with, though. Gemini solves this problem by doing the "thinking" in a hidden box, and then presenting it to the user in soft language.

3. idonotknowwhy ◴[] No.43649100[source]
Have you tried Deepseek-R1?

I run it locally and read the raw thought process, find it very useful (can be ruthless at times) seeing this before it tags on the friendliness.

Then you can see it's planning process to tag on the warmth/friendliness "but the user seems proud of... so I need to acknowledge..."

I don't think Gemini's "thoughts" are the raw CoT process, they're summarized / cleaned up by a small model before returned to you (same as OpenAI models).

replies(1): >>43652741 #
4. andai ◴[] No.43652741[source]
That's fascinating. I've been trying to get other models to mimick Gemini 2.5 Pro's thought process, but even with examples, they don't do it very well. Which surprised me, because I think even the original (no RLHF) GPT-3 was pretty good at following formats like that! But maybe there's not enough training data in that format for it to "click".

It does seem similar in structure to Gemini 2.0's output format with the nested bullets though, so I have to assume they trained on synthetic examples.