←back to thread

196 points yuedongze | 3 comments | | HN request time: 0s | source
Show context
gradus_ad ◴[] No.46195373[source]
The proliferation of nondeterministically generated code is here to stay. Part of our response must be more dynamic, more comprehensive and more realistic workload simulation and testing frameworks.
replies(5): >>46195431 #>>46195733 #>>46197437 #>>46197956 #>>46199307 #
1. energy123 ◴[] No.46199307[source]
Nondeterministic isn't the right word because LLM outputs are deterministic and the tokens created from those outputs can also be deterministic.
replies(1): >>46199468 #
2. Yoric ◴[] No.46199468[source]
I agree that non-deterministic isn't the right word, because that's not the property we care about, but unless I'm strongly missing something LLM outputs are very much non-deterministic, both during the inference itself and when projecting the embeddings back into tokens.
replies(1): >>46199738 #
3. energy123 ◴[] No.46199738[source]
I agree it isn't the main property we care about, we care about reliability.

But at least in its theoretical construction the LLM should be deterministic. It outputs a fixed probability distribution across tokens with no rng involvement.

We then sample from that fixed distribution non-deterministically for better performance or we use greedy decoding and get slightly worse performance in exchange for full determinism.

Happy to be corrected if I am wrong about something.