Auto-grading decade-old Hacker News discussions with hindsight

1. popinman322 ◴[11 Dec 25 04:43 UTC] No.46227755[source]▶

>>46220540 (OP) #

It doesn't look like the code anonymizes usernames when sending the thread for grading. This likely induces bias in the grades based on past/current prevailing opinions of certain users. It would be interesting to see the whole thing done again but this time randomly re-assigning usernames, to assess bias, and also with procedurally generated pseudonyms, to see whether the bias can be removed that way.

I'd expect de-biasing would deflate grades for well known users.

It might also be interesting to use a search-grounded model that provides citations for its grading claims. Gemini models have access to this via their API, for example.

replies(2): >>46228238 #>>46231628 #

2. khafra ◴[11 Dec 25 06:15 UTC] No.46228238[source]▶

>>46227755 (TP) #

You can't anonymize comments from well-known users, to an LLM: https://gwern.net/doc/statistics/stylometry/truesight/index

replies(1): >>46228785 #

3. WithinReason ◴[11 Dec 25 07:55 UTC] No.46228785[source]▶

>>46228238 #

That's an overly strong claim, an LLM could also be used to normalise style

replies(1): >>46230638 #

4. wetpaws ◴[11 Dec 25 12:36 UTC] No.46230638{3}[source]▶

>>46228785 #

How would you possibly grade comments if you change them?

replies(2): >>46230932 #>>46230934 #

5. strken ◴[11 Dec 25 13:10 UTC] No.46230932{4}[source]▶

>>46230638 #

Extract the concrete predictions, evaluate them as true/false/indeterminate, and grade the user on the number of true vs false?

replies(1): >>46234055 #

6. koakuma-chan ◴[11 Dec 25 13:10 UTC] No.46230934{4}[source]▶

>>46230638 #

You don’t need comments, just facts in them to see if they’re accurate.

7. ProllyInfamous ◴[11 Dec 25 14:15 UTC] No.46231628[source]▶

>>46227755 (TP) #

What a human-like critizicism of human-like behavior.

I [as a human] also do the same thing when observing others in IRL and forum interactions. Reputation matters™

----

A further question is whether a bespoke username could influence the bias of a particular comment (e.g. A username of something like HatesPython might influence the interpretation of that commenter's particular perception of the Python coding language, which might actually be expressing positivity — the username's irony lost to the AI?).

8. Natsu ◴[11 Dec 25 17:11 UTC] No.46234055{5}[source]▶

>>46230932 #

This doesn't even seem to look at "predictions" if you dig into what it actually did. Looking at my own example (#210 on https://karpathy.ai/hncapsule/hall-of-fame.html with 4 comments), very little of what I said could be construed as "predictions" at all.

I got an A for commenting on DF saying that I had not personally seen save corruption and listing weird bugs. It's true that weird bugs have long been a defining feature of DF, but I didn't predict it would remain that way or say that save corruption would never be a big thing, just that I hadn't personally seen it.

Another A for a comment on Google wallet just pointing out that users are already bad at knowing what links to trust. Sure, that's still true (and probably will remain true until something fundamental changes), but it was at best half a prediction as it wasn't forward looking.

Then something on hospital airships from the 1930s. I pointed out that one could escape pollution, I never said I thought it would be a big thing. Airships haven't really ever been much of a thing, except in fiction. Maybe that could change someday, but I kinda doubt it.

Then lastly there was the design patent famously referred to as the "rounded corner" patent. It dings me for simplifying it to that label, despite my actual statements being that yes, there's more, but just minor details like that can be sufficient for infringement. But the LLM says I'm right about ties to the Samsung case and still oversimplifying it. Either way, none of this was really a prediction to begin with.