←back to thread

Using LLMs at Oxide

(rfd.shared.oxide.computer)
694 points steveklabnik | 1 comments | | HN request time: 0s | source
Show context
cobertos ◴[] No.46179465[source]
> LLMs are especially good at evaluating documents to assess the degree that an LLM assisted their creation!)

That's a bold claim. Do they have data to back this up? I'd only have confidence to say this after testing this against multiple LLM outputs, but does this really work for, e.g. the em dash leaderboard of HN or people who tell an LLM to not do these 10 LLM-y writing cliches? I would need to see their reasoning on why they think this to believe.

replies(3): >>46180279 #>>46180384 #>>46182998 #
yard2010 ◴[] No.46180384[source]
I thought about it - a quick way to verify whether something was created with LLM is to feed an LLM half of the text and then let it complete token by token. Every completion, check not just for the next token but the next n-probable tokens. If one of them is the one you have in the text, pick it and continue. This way, I think, you can identify how much the model is "correct" by predicting the text it hasn't yet seen.

I didn't test it and I'm far from an expert, maybe someone can challenge it?

replies(2): >>46180573 #>>46180905 #
1. akoboldfrying ◴[] No.46180905[source]
I expect that, for values of n for which this test consistently reports "LLM-generated" on LLM-generated inputs, it will also consistently report "LLM-generated" on human-generated inputs. But I haven't done the test either so I could be wrong.