←back to thread

246 points doener | 1 comments | | HN request time: 0.248s | source
1. YetAnotherNick ◴[] No.43691621[source]
They compared with Llama 3.1 and found that to be better on average for their tasks like European MMLU. And Llama 3.1 is the worst in the batch with Qwen 2.5 and Gemma 3 being significantly better.