←back to thread

724 points simonw | 2 comments | | HN request time: 0s | source
Show context
xnx ◴[] No.44527256[source]
> It’s worth noting that LLMs are non-deterministic,

This is probably better phrased as "LLMs may not provide consistent answers due to changing data and built-in randomness."

Barring rare(?) GPU race conditions, LLMs produce the same output given the same inputs.

replies(7): >>44527264 #>>44527395 #>>44527458 #>>44528870 #>>44530104 #>>44533038 #>>44536027 #
simonw ◴[] No.44527395[source]
I don't think those race conditions are rare. None of the big hosted LLMs provide a temperature=0 plus fixed seed feature which they guarantee won't return different results, despite clear demand for that from developers.
replies(3): >>44527634 #>>44529574 #>>44529823 #
toolslive ◴[] No.44529574[source]
I, naively (an uninformed guess), considered the non-determinism (multiple results possible, even with temperature=0 and fixed seed) stemming from floating point rounding errors propagating through the calculations. How wrong am I ?
replies(4): >>44529754 #>>44529801 #>>44529836 #>>44531008 #
williamdclt ◴[] No.44529801[source]
Also uninformed but I can't see how that would be true, floating point rounding errors are entirely deterministic
replies(1): >>44531897 #
saagarjha ◴[] No.44531897[source]
Not if your scheduler causes accumulation in a different order.
replies(1): >>44533285 #
1. williamdclt ◴[] No.44533285[source]
Are you talking about a DAG of FP calculations, where parallel steps might finish in different order across different executions? That's getting out of my area of knowledge, but I'd believe it's possible
replies(1): >>44546301 #
2. saagarjha ◴[] No.44546301[source]
Well a very simple example would be if you run a parallel reduce using atomics the result will depend on which workers acquire the accumulator first.