Most active commenters
  • troupo(8)

←back to thread

724 points simonw | 17 comments | | HN request time: 1.547s | source | bottom
Show context
xnx ◴[] No.44527256[source]
> It’s worth noting that LLMs are non-deterministic,

This is probably better phrased as "LLMs may not provide consistent answers due to changing data and built-in randomness."

Barring rare(?) GPU race conditions, LLMs produce the same output given the same inputs.

replies(7): >>44527264 #>>44527395 #>>44527458 #>>44528870 #>>44530104 #>>44533038 #>>44536027 #
1. troupo ◴[] No.44528870[source]
> Barring rare(?) GPU race conditions, LLMs produce the same output given the same inputs.

Are these LLMs in the room with us?

Not a single LLM available as a SaaS is deterministic.

As for other models: I've only run ollama locally, and it, too, provided different answers for the same question five minutes apart

Edit/update: not a single LLM available as a SaaS's output is deterministic, especially when used from a UI. Pointing out that you could probably run a tightly controlled model in a tightly controlled environment to achieve deterministic output is very extremely irrelevant when describing output of grok in situations when the user has no control over it

replies(5): >>44528884 #>>44528892 #>>44528898 #>>44528952 #>>44528971 #
2. fooker ◴[] No.44528884[source]
> Not a single LLM available as a SaaS is deterministic.

Lower the temperature parameter.

replies(2): >>44528930 #>>44529115 #
3. eightysixfour ◴[] No.44528892[source]
The models themselves are mathematically deterministic. We add randomness during the sampling phase, which you can turn off when running the models locally.

The SaaS APIs are sometimes nondeterministic due to caching strategies and load balancing between experts on MoE models. However, if you took that model and executed it in single user environment, it could also be done deterministically.

replies(1): >>44528944 #
4. moralestapia ◴[] No.44528898[source]
True.

I'm now wondering, would it be desirable to have deterministic outputs on an LLM?

5. troupo ◴[] No.44528930[source]
So, how does one do it outside of APIs in the context we're discussing? In the UI or when invoking @grok in X?

How do we also turn off all the intermediate layers in between that we don't know about like "always rant about white genocide in South Africa" or "crash when user mentions David Meyer"?

replies(1): >>44530946 #
6. troupo ◴[] No.44528944[source]
> However, if you took that model and executed it in single user environment,

Again, are those environments in the room with us?

In the context of the article, is the model executed in such an environment? Do we even know anything about the environment, randomness, sampling and anything in between or have any control over it (see e.g https://news.ycombinator.com/item?id=44528930)?

replies(1): >>44531825 #
7. DemocracyFTW2 ◴[] No.44528952[source]
Akchally... Strictly speaking and to the best of my understanding, LLMs are deterministic in the sense that a dice roll is deterministic; the randomness comes from insufficient knowledge about its internal state. But use a constant seed and run the model with the same sequence of questions, you will get the same answers. It's possible that the interactions with other users who use the model in parallel could influence the outcome, but given that the state-of-the-art technique to provide memory and context is to re-submit the entirety of the current chat I'd doubt that. One hint that what I surmise is in fact true can be gleaned from those text-to-image generators that allow seeds to be set; you still don't get a 'linear', predictable (but hopefully a somewhat-sensible) relation between prompt to output, but each (seed, prompt) pair will always give the same sequence of images.
8. orbital-decay ◴[] No.44528971[source]
> Not a single LLM available as a SaaS is deterministic.

Gemini Flash has deterministic outputs, assuming you're referring to temperature 0 (obviously). Gemini Pro seems to be deterministic within the same kernel (?) but is likely switching between a few different kernels back and forth, depending on the batch or some other internal grouping.

replies(1): >>44529041 #
9. troupo ◴[] No.44529041[source]
And it's the author of the original article running Gemkni Flash/GemmniPro through an API where he can control the temperature? can kernels be controlled by the user? Any of those can be controlled through the UI/apis where most of these LLMs are involved from?

> but is likely switching between a few different kernels back and forth, depending on the batch or some other internal grouping.

So you're literally saying it's non-deterministic

replies(1): >>44529068 #
10. orbital-decay ◴[] No.44529068{3}[source]
The only thing I'm saying is that there is a SaaS model that would give you the same output for the same input, over and over. You just seem to be arguing for the sake of arguing, especially considering that non-determinism is a red herring to begin with, and not a thing to care about for practical use (that's why providers usually don't bother with guaranteeing it). The only reason it was mentioned in the article is because the author is basically reverse engineering a particular model.
replies(1): >>44532061 #
11. pydry ◴[] No.44529115[source]
It's not enough. Ive done this and still often gotten different results for the same question.
12. marcinzm ◴[] No.44530946{3}[source]
Grok is not deterministic would then be the correct statement.
replies(1): >>44532080 #
13. mathiaspoint ◴[] No.44531825{3}[source]
It's very poor communication. They absolutely do not have to be non-deterministic.
replies(1): >>44532052 #
14. troupo ◴[] No.44532052{4}[source]
The output of all these systems used by people not through API is non-deterministic.
replies(1): >>44537068 #
15. troupo ◴[] No.44532061{4}[source]
> especially considering that non-determinism is a red herring to begin with, and not a thing to care about for practical use

That is, it really is important in practical use because it's impossible to talk about stuff like in the original article without being able to consistently reproduce results.

Also, in almost all situations you really do want deterministic output (remember how "do what I want and what is expected" was an important property of computer systems? Good times)

> The only reason it was mentioned in the article is because the author is basically reverse engineering a particular model.

The author is attempting reverse engineering the model, the randomness and the temperature, the system prompts and the training set, and all the possible layers added by xAI in between, and still getting a non-deterministic output.

HN: no-no-no, you don't understand, it's 100% deterministic and it doesn't matter

16. troupo ◴[] No.44532080{4}[source]
When used through UI, like the author does, Grok isn't. OpenAI isn't. Gemini isn't
17. troupo ◴[] No.44537068{5}[source]
I would also assume that in vast majority of cases people don't set temperature to zero even with API calls.

And even if you do set it to zero, you never know what changes to the layers and layers of wrappers and system prompts you will run into on any given day resulting in "on this day we crash for certain input, and on other days we don't": https://www.techdirt.com/2024/12/03/the-curious-case-of-chat...