(allenai.org)

361 points mseri | 1 comments | 21 Nov 25 06:50 UTC | HN request time: 0.207s | source

Show context

weregiraffe ◴[21 Nov 25 16:18 UTC] No.46005853[source]▶

Is the training data open-source? And can you validate that the model was trained on the claimed training data alone? Without this, all benchmarks are useless.

replies(1): >>46005929 #

1. comp_raccoon ◴[21 Nov 25 16:25 UTC] No.46005929[source]▶

>>46005853 #

Olmo author here! we release all training data and all our training scripts, plus intermediate checkpoints, so you could take a checkpoint, reproduce a few steps on the training data, and check if loss matches.

it’s no cryptography proof, and you can’t get perfect determinism on nvidia GPUs, but it’s pretty close.

↑

Olmo 3: Charting a path through the model flow to lead open-source AI