←back to thread

54 points tudorizer | 9 comments | | HN request time: 1.271s | source | bottom
1. felineflock ◴[] No.44371711[source]
It is a new nature of abstraction, not a new level.

UP: It lets us state intent in plain language, specs, or examples. We can ask the model to invent code, tests, docs, diagrams—tasks that previously needed human translation from intention to syntax.

BUT SIDEWAYS: Generation is a probability distribution over tokens. Outputs vary with sampling temperature, seed, context length, and even with identical prompts.

replies(2): >>44403418 #>>44403438 #
2. dcminter ◴[] No.44403418[source]
Surely given an identical prompt with a clean context and the same seed the outputs will not vary?
replies(2): >>44403454 #>>44404213 #
3. genidoi ◴[] No.44403438[source]
This is too abstract and a concrete example of what this looks like in output is needed.
4. diggan ◴[] No.44403454[source]
+ temperature=0.0 would be needed for reproducible outputs. And even with that, if it's actually reproducible or not depends on the model/weights themselves, not all of them are even when all those things are static. And then finally depends on the implementation of the model architecture as well.

I think the tricky part is that we tend to think that prompts with similar semantic meaning will give the same outputs (like a human), while LLMs can give vastly different outputs if you have one spelling mistake for example, or used "!" instead of "?", the effect varies greatly per model.

replies(2): >>44403803 #>>44403989 #
5. dcminter ◴[] No.44403803{3}[source]
Hmm, I'm barely even a dabbler, but I'd assumed that the seed in question drove the (pseudo)randomness inherent in "temperature" - if not, what seed(s) do they use and why could one not set that/those too?

To your second part I wouldn't make that assumption - I can see how a non-technical person might, but surely programmers wouldn't? I've certainly produced very different output from that which I intended in boring old C with a mis-placed semi-colon after all!

replies(1): >>44404515 #
6. smokel ◴[] No.44403989{3}[source]
> I think the tricky part is that we tend to think that prompts with similar semantic meaning will give the same outputs (like a human)

Trust me, this response would have been totally different if I were in a different mood.

7. furyofantares ◴[] No.44404213[source]
You can make these things deterministic for sure, and so you could also store prompts plus model details instead of code if you really wanted to. Lots of reasons this would be a very very poor choice but you could do it.

I don't think that's how you should think about these things being non-deterministic though.

Let's call that technical determinism, and then introduce a separate concept, practical determinism.

What I'm calling practical determinism is your ability as the author to predict (determine) the results. Two different prompts that mean the same thing to me will give different results, and my ability to reason about the results from changes to my prompt is fuzzy. I can have a rough idea, I can gain skill in this area, but I can't gain anything like the same precision as I have reasoning about the results of code I author.

8. diggan ◴[] No.44404515{4}[source]
> Hmm, I'm barely even a dabbler, but I'd assumed that the seed in question drove the (pseudo)randomness inherent in "temperature" - if not, what seed(s) do they use and why could one not set that/those too?

Implementations and architectures are different enough that it's hard to say "It's like X" in all cases. Last time I tried to achieve 100% reproducible outputs, which obviously includes hard-coding various seeds, I remember not getting reproducible outputs unless setting temperature to 0, I think this was with Qwen2 or Qwq used via Huggingface's Transformers library, but cannot find the exact details now.

Then in other cases, like the hosted OpenAI models, they straight up say "temperature to 0 makes them mostly deterministic", but I'm not exactly sure why they are unable to offer endpoints with determinism.

> I can see how a non-technical person might, but surely programmers wouldn't?

When talking even with developers about prompting and LLMs, there is still quite a few people who are surprised that "You are a helpful assistant." would lead to different outputs than "You are a helpful assistant!". I think if you're a programmer or not matters less, more about understanding how the LLMs actually work in order to understand that.

replies(1): >>44407315 #
9. dcminter ◴[] No.44407315{5}[source]
Oh, well that's super interesting, thanks; I guess some side effect of the high degree of parallelism? Anyway, I guess I need to do a bit more than dabble.

> I think if you're a programmer or not matters less, more about understanding how the LLMs actually work in order to understand that.

Sounds like I need to understand them better then as I merely had different misaprehensions than those. More reading for me...