(mistral.ai)

216 points veggieroll | 1 comments | 16 Oct 24 14:31 UTC | HN request time: 0.461s | source

Show context

tarruda ◴[16 Oct 24 19:58 UTC] No.41863180[source]▶

They didn't add a comparison to Qwen 2.5 3b, which seems to surpass Ministral 3b MMLU, HumanEval, GSM8K: https://qwen2.org/qwen2-5/#qwen25-05b15b3b-performance

These benchmarks don't really matter that much, but it is funny how this blog post conveniently forgot to compare with a model that already exists and performs better.

replies(2): >>41863218 #>>41863231 #

1. butterfly42069 ◴[16 Oct 24 20:02 UTC] No.41863218[source]▶

>>41863180 #

At this point the benchmarks barely matter at all. It's entirely possible to train for a high benchmark score and reduce the overall quality of the model in the process.

Imo use the model that makes the most sense when you ask it stuff, and personally I'd go for the one with the least censorship (which imo isn't AliBaba Qwen anything)

↑

Un Ministral, Des Ministraux