(machinelearning.apple.com)

171 points pizza | 1 comments | 06 Apr 25 08:53 UTC | HN request time: 0.221s | source

Show context

EGreg ◴[06 Apr 25 15:52 UTC] No.43602366[source]▶

What did Zuck mean that Llama 4 Behemoth is already the highest performing base model and hasnt even done training yet? What are the benchmarks then?

Does he mean they did pretraining but not fine tuning?

replies(1): >>43605384 #

1. tintor ◴[06 Apr 25 22:10 UTC] No.43605384[source]▶

You can fine tune a checkpoint of model during pre-training.

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators