←back to thread

171 points pizza | 1 comments | | HN request time: 0.201s | source
Show context
EGreg ◴[] No.43602366[source]
What did Zuck mean that Llama 4 Behemoth is already the highest performing base model and hasnt even done training yet? What are the benchmarks then?

Does he mean they did pretraining but not fine tuning?

replies(1): >>43605384 #
1. tintor ◴[] No.43605384[source]
You can fine tune a checkpoint of model during pre-training.