←back to thread

54 points tudorizer | 3 comments | | HN request time: 0s | source
Show context
felineflock ◴[] No.44371711[source]
It is a new nature of abstraction, not a new level.

UP: It lets us state intent in plain language, specs, or examples. We can ask the model to invent code, tests, docs, diagrams—tasks that previously needed human translation from intention to syntax.

BUT SIDEWAYS: Generation is a probability distribution over tokens. Outputs vary with sampling temperature, seed, context length, and even with identical prompts.

replies(2): >>44403418 #>>44403438 #
dcminter ◴[] No.44403418[source]
Surely given an identical prompt with a clean context and the same seed the outputs will not vary?
replies(2): >>44403454 #>>44404213 #
diggan ◴[] No.44403454[source]
+ temperature=0.0 would be needed for reproducible outputs. And even with that, if it's actually reproducible or not depends on the model/weights themselves, not all of them are even when all those things are static. And then finally depends on the implementation of the model architecture as well.

I think the tricky part is that we tend to think that prompts with similar semantic meaning will give the same outputs (like a human), while LLMs can give vastly different outputs if you have one spelling mistake for example, or used "!" instead of "?", the effect varies greatly per model.

replies(2): >>44403803 #>>44403989 #
1. dcminter ◴[] No.44403803[source]
Hmm, I'm barely even a dabbler, but I'd assumed that the seed in question drove the (pseudo)randomness inherent in "temperature" - if not, what seed(s) do they use and why could one not set that/those too?

To your second part I wouldn't make that assumption - I can see how a non-technical person might, but surely programmers wouldn't? I've certainly produced very different output from that which I intended in boring old C with a mis-placed semi-colon after all!

replies(1): >>44404515 #
2. diggan ◴[] No.44404515[source]
> Hmm, I'm barely even a dabbler, but I'd assumed that the seed in question drove the (pseudo)randomness inherent in "temperature" - if not, what seed(s) do they use and why could one not set that/those too?

Implementations and architectures are different enough that it's hard to say "It's like X" in all cases. Last time I tried to achieve 100% reproducible outputs, which obviously includes hard-coding various seeds, I remember not getting reproducible outputs unless setting temperature to 0, I think this was with Qwen2 or Qwq used via Huggingface's Transformers library, but cannot find the exact details now.

Then in other cases, like the hosted OpenAI models, they straight up say "temperature to 0 makes them mostly deterministic", but I'm not exactly sure why they are unable to offer endpoints with determinism.

> I can see how a non-technical person might, but surely programmers wouldn't?

When talking even with developers about prompting and LLMs, there is still quite a few people who are surprised that "You are a helpful assistant." would lead to different outputs than "You are a helpful assistant!". I think if you're a programmer or not matters less, more about understanding how the LLMs actually work in order to understand that.

replies(1): >>44407315 #
3. dcminter ◴[] No.44407315[source]
Oh, well that's super interesting, thanks; I guess some side effect of the high degree of parallelism? Anyway, I guess I need to do a bit more than dabble.

> I think if you're a programmer or not matters less, more about understanding how the LLMs actually work in order to understand that.

Sounds like I need to understand them better then as I merely had different misaprehensions than those. More reading for me...