←back to thread

171 points pizza | 2 comments | | HN request time: 0s | source
1. EGreg ◴[] No.43602366[source]
What did Zuck mean that Llama 4 Behemoth is already the highest performing base model and hasnt even done training yet? What are the benchmarks then?

Does he mean they did pretraining but not fine tuning?

replies(1): >>43605384 #
2. tintor ◴[] No.43605384[source]
You can fine tune a checkpoint of model during pre-training.