Olmo 3: Charting a path through the model flow to lead open-source AI

(allenai.org)

361 points mseri | 2 comments | 21 Nov 25 06:50 UTC | HN request time: 0s | source

Show context

Y_Y ◴[21 Nov 25 09:59 UTC] No.46002975[source]▶

I asked it if giraffes were kosher to eat and it told me:

> Giraffes are not kosher because they do not chew their cud, even though they have split hooves. Both requirements must be satisfied for an animal to be permissible.

HN will have removed the extraneous emojis.

This is at odds with my interpretation of giraffe anatomy and behaviour and of Talmudic law.

Luckily old sycophant GPT5.1 agrees with me:

> Yes. They have split hooves and chew cud, so they meet the anatomical criteria. Ritual slaughter is technically feasible though impractical.

replies(3): >>46004171 #>>46005088 #>>46006063 #

embedding-shape ◴[21 Nov 25 13:10 UTC] No.46004171[source]▶

>>46002975 #

How many times did you retry (so it's not just up to chance), what were the parameters, specifically for temperature and top_p?

replies(2): >>46005252 #>>46005308 #

latexr ◴[21 Nov 25 15:05 UTC] No.46005252[source]▶

>>46004171 #

> How many times did you retry (so it's not just up to chance)

If you don’t know the answer to a question, retrying multiple times only serves to amplify your bias, you have no basis to know the answer is correct.

replies(3): >>46005264 #>>46005329 #>>46005903 #

embedding-shape ◴[21 Nov 25 15:07 UTC] No.46005264[source]▶

>>46005252 #

Well, seems in this case parent did know the answer, so I'm not sure what your point is.

I'm asking for the sake of reproducibility and to clarify if they used the text-by-chance generator more than once, to ensure they didn't just hit one out of ten bad cases since they only tested it once.

replies(1): >>46006117 #

1. latexr ◴[21 Nov 25 16:46 UTC] No.46006117[source]▶

>>46005264 #

> so I'm not sure what your point is.

That your suggestion would not correspond to real use by real regular people. OP posted the message as noteworthy because they knew it was wrong. Anyone who didn’t and trusts LLMs blindly (which is not a small number) would’ve left it at that and gone about their day with wrong information.

replies(1): >>46006285 #

2. embedding-shape ◴[21 Nov 25 17:03 UTC] No.46006285[source]▶

>>46006117 (TP) #

> That your suggestion would not correspond to real use by real regular people.

Which wasn't the point either, the point was just to ask "Did you run one prompt, or many times?" as that obviously impacts how seriously you can take whatever outcome you get.

↑