←back to thread

395 points pseudolus | 2 comments | | HN request time: 0.55s | source
1. j2kun ◴[] No.43633815[source]
They use an LLM to summarize the chats, which IMO makes the results as fundamentally unreliable as LLMs are. Maybe for an aggregate statistical analysis (for the purpose of...vibe-based product direction?) this is good enough, but if you were to use this to try to inform impactful policies, caveat emptor.
replies(1): >>43633876 #
2. j2kun ◴[] No.43633876[source]
For example, it's fashionable in math education these days to ask students to generate problems as a different mode of probing understanding of a topic. And from the article: "We found that students primarily use Claude to create and improve educational content across disciplines (39.3% of conversations). This often entailed designing practice questions, ..." That last part smells fishy, and even if you saw a prompt like "design a practice question..." you wouldn't be able to know if they were cheating, given the context mentioned above.