←back to thread

Hermes 4

(hermes4.nousresearch.com)
202 points sibellavia | 1 comments | | HN request time: 0s | source
Show context
lern_too_spel ◴[] No.45069425[source]
The charts are utter nonsense. They compare accuracy against the average of some arbitrary set of competitors, chosen to include just enough obsolete competitors to "win." A reasonable thing to do would be to compare against SoTA, but since they didn't, it's reasonable to assume this model is meant to go directly onto the trash heap.
replies(3): >>45069769 #>>45069848 #>>45069996 #
1. whymauri ◴[] No.45069769[source]
The most direct, non-marketing, non-aesthetic summary is that this model trades off a few points on 'fundamental benchmarks' (GPQA, MATH/AIME, MMLU) in exchange for being a 'more steerable' (less refusals) scaffold for downstream tuning.

Within that framing, I think it's easier to see where and how the model fits into the larger ecosystem. But, of course, the best benchmark will always be just using the model.