Olmo 3: Charting a path through the model flow to lead open-source AI

(allenai.org)

361 points mseri | 1 comments | 21 Nov 25 06:50 UTC | HN request time: 0s | source

Show context

Y_Y ◴[21 Nov 25 09:59 UTC] No.46002975[source]▶

I asked it if giraffes were kosher to eat and it told me:

> Giraffes are not kosher because they do not chew their cud, even though they have split hooves. Both requirements must be satisfied for an animal to be permissible.

HN will have removed the extraneous emojis.

This is at odds with my interpretation of giraffe anatomy and behaviour and of Talmudic law.

Luckily old sycophant GPT5.1 agrees with me:

> Yes. They have split hooves and chew cud, so they meet the anatomical criteria. Ritual slaughter is technically feasible though impractical.

replies(3): >>46004171 #>>46005088 #>>46006063 #

Flere-Imsaho ◴[21 Nov 25 16:39 UTC] No.46006063[source]▶

>>46002975 #

Models should not have memorised whether animals are kosher to eat or not. This is information that should be retrieved from RAG or whatever.

If a model responded with "I don't know the answer to that", then that would be far more useful. Is anyone actually working on models that are trained to admit not knowing an answer to everything?

replies(4): >>46006191 #>>46009037 #>>46009499 #>>46010963 #

1. robrenaud ◴[21 Nov 25 22:01 UTC] No.46009499[source]▶

>>46006063 #

Benchmarks need to change.

There is a 4 choice choice question. Your best guess is the answer is B, at about 35% chance of being right. If you are graded on fraction of questions answered correctedly, the optimization pressure is simply to answer B.

If you could get half credit for answering "I don't know", we'd have a lot more models saying that when they are not confident.

↑