(rfd.shared.oxide.computer)

694 points steveklabnik | 1 comments | 07 Dec 25 01:17 UTC | HN request time: 0s | source

Show context

cobertos ◴[07 Dec 25 05:50 UTC] No.46179465[source]▶

> LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation!)

That's a bold claim. Do they have data to back this up? I'd only have confidence to say this after testing this against multiple LLM outputs, but does this really work for, e.g. the em dash leaderboard of HN or people who tell an LLM to not do these 10 LLM-y writing cliches? I would need to see their reasoning on why they think this to believe.

replies(3): >>46180279 #>>46180384 #>>46182998 #

yard2010 ◴[07 Dec 25 09:30 UTC] No.46180384[source]▶

>>46179465 #

I thought about it - a quick way to verify whether something was created with LLM is to feed an LLM half of the text and then let it complete token by token. Every completion, check not just for the next token but the next n-probable tokens. If one of them is the one you have in the text, pick it and continue. This way, I think, you can identify how much the model is "correct" by predicting the text it hasn't yet seen.

I didn't test it and I'm far from an expert, maybe someone can challenge it?

replies(2): >>46180573 #>>46180905 #

1. akoboldfrying ◴[07 Dec 25 11:16 UTC] No.46180905[source]▶

>>46180384 #

I expect that, for values of n for which this test consistently reports "LLM-generated" on LLM-generated inputs, it will also consistently report "LLM-generated" on human-generated inputs. But I haven't done the test either so I could be wrong.

↑

Using LLMs at Oxide