(machinelearning.apple.com)

171 points pizza | 2 comments | 06 Apr 25 08:53 UTC | HN request time: 0.405s | source

1. EGreg ◴[06 Apr 25 15:52 UTC] No.43602366[source]▶

What did Zuck mean that Llama 4 Behemoth is already the highest performing base model and hasnt even done training yet? What are the benchmarks then?

Does he mean they did pretraining but not fine tuning?

replies(1): >>43605384 #

2. tintor ◴[06 Apr 25 22:10 UTC] No.43605384[source]▶

You can fine tune a checkpoint of model during pre-training.

SeedLM: Compressing LLM Weights into Seeds of Pseudo-Random Generators