←back to thread

626 points __rito__ | 8 comments | | HN request time: 0s | source | bottom

Related from yesterday: Show HN: Gemini Pro 3 imagines the HN front page 10 years from now - https://news.ycombinator.com/item?id=46205632
1. popinman322 ◴[] No.46227755[source]
It doesn't look like the code anonymizes usernames when sending the thread for grading. This likely induces bias in the grades based on past/current prevailing opinions of certain users. It would be interesting to see the whole thing done again but this time randomly re-assigning usernames, to assess bias, and also with procedurally generated pseudonyms, to see whether the bias can be removed that way.

I'd expect de-biasing would deflate grades for well known users.

It might also be interesting to use a search-grounded model that provides citations for its grading claims. Gemini models have access to this via their API, for example.

replies(2): >>46228238 #>>46231628 #
2. khafra ◴[] No.46228238[source]
You can't anonymize comments from well-known users, to an LLM: https://gwern.net/doc/statistics/stylometry/truesight/index
replies(1): >>46228785 #
3. WithinReason ◴[] No.46228785[source]
That's an overly strong claim, an LLM could also be used to normalise style
replies(1): >>46230638 #
4. wetpaws ◴[] No.46230638{3}[source]
How would you possibly grade comments if you change them?
replies(2): >>46230932 #>>46230934 #
5. strken ◴[] No.46230932{4}[source]
Extract the concrete predictions, evaluate them as true/false/indeterminate, and grade the user on the number of true vs false?
replies(1): >>46234055 #
6. koakuma-chan ◴[] No.46230934{4}[source]
You don’t need comments, just facts in them to see if they’re accurate.
7. ProllyInfamous ◴[] No.46231628[source]
What a human-like critizicism of human-like behavior.

I [as a human] also do the same thing when observing others in IRL and forum interactions. Reputation matters™

----

A further question is whether a bespoke username could influence the bias of a particular comment (e.g. A username of something like HatesPython might influence the interpretation of that commenter's particular perception of the Python coding language, which might actually be expressing positivity — the username's irony lost to the AI?).

8. Natsu ◴[] No.46234055{5}[source]
This doesn't even seem to look at "predictions" if you dig into what it actually did. Looking at my own example (#210 on https://karpathy.ai/hncapsule/hall-of-fame.html with 4 comments), very little of what I said could be construed as "predictions" at all.

I got an A for commenting on DF saying that I had not personally seen save corruption and listing weird bugs. It's true that weird bugs have long been a defining feature of DF, but I didn't predict it would remain that way or say that save corruption would never be a big thing, just that I hadn't personally seen it.

Another A for a comment on Google wallet just pointing out that users are already bad at knowing what links to trust. Sure, that's still true (and probably will remain true until something fundamental changes), but it was at best half a prediction as it wasn't forward looking.

Then something on hospital airships from the 1930s. I pointed out that one could escape pollution, I never said I thought it would be a big thing. Airships haven't really ever been much of a thing, except in fiction. Maybe that could change someday, but I kinda doubt it.

Then lastly there was the design patent famously referred to as the "rounded corner" patent. It dings me for simplifying it to that label, despite my actual statements being that yes, there's more, but just minor details like that can be sufficient for infringement. But the LLM says I'm right about ties to the Samsung case and still oversimplifying it. Either way, none of this was really a prediction to begin with.