←back to thread

361 points mseri | 1 comments | | HN request time: 0.321s | source
Show context
Y_Y ◴[] No.46002975[source]
I asked it if giraffes were kosher to eat and it told me:

> Giraffes are not kosher because they do not chew their cud, even though they have split hooves. Both requirements must be satisfied for an animal to be permissible.

HN will have removed the extraneous emojis.

This is at odds with my interpretation of giraffe anatomy and behaviour and of Talmudic law.

Luckily old sycophant GPT5.1 agrees with me:

> Yes. They have split hooves and chew cud, so they meet the anatomical criteria. Ritual slaughter is technically feasible though impractical.

replies(3): >>46004171 #>>46005088 #>>46006063 #
embedding-shape ◴[] No.46004171[source]
How many times did you retry (so it's not just up to chance), what were the parameters, specifically for temperature and top_p?
replies(2): >>46005252 #>>46005308 #
latexr ◴[] No.46005252[source]
> How many times did you retry (so it's not just up to chance)

If you don’t know the answer to a question, retrying multiple times only serves to amplify your bias, you have no basis to know the answer is correct.

replies(3): >>46005264 #>>46005329 #>>46005903 #
observationist ◴[] No.46005903[source]
https://en.wikipedia.org/wiki/Monte_Carlo_method

If it's out of distribution, you're more likely to get a chaotic distribution around the answer to a question, whereas if it's just not known well, you'll get a normal distribution, with a flatter slope the less well modeled a concept is.

There are all sorts of techniques and methods you can use to get a probabilistically valid assessment of outputs from LLMs, they're just expensive and/or tedious.

Repeated sampling gives you the basis to make a Bayesian model of the output, and you can even work out rigorous numbers specific to the model and your prompt framework by sampling things you know the model has in distribution and comparing the curves against your test case, giving you a measure of relative certainty.

replies(1): >>46006084 #
latexr ◴[] No.46006084[source]
Sounds like just not using an LLM would be considerably less effort and fewer wasted resources.
replies(1): >>46006204 #
1. dicknuckle ◴[] No.46006204[source]
It's a way to validate the LLM output in a test scenario.