←back to thread

Hermes 4

(hermes4.nousresearch.com)
202 points sibellavia | 7 comments | | HN request time: 0.923s | source | bottom
1. rafram ◴[] No.45069375[source]
All of the examples just look like ChatGPT. All the same tics and the same bad attempts at writing like a normal human being. What is actually better about this model?
replies(1): >>45069439 #
2. mapontosevenths ◴[] No.45069439[source]
I hasn't been "aligned". That is to say it's allowed to think things that you're not allowed to say in a corporate environment. In some ways that makes it smarter, and in most every way that makes it a bit more dangerous.

Tools are like that though. Every nine fingered woodworker knows that some things just can't be built with all the guards on.

replies(3): >>45069487 #>>45070079 #>>45070917 #
3. rafram ◴[] No.45069487[source]
Has it actually not? Because the example texts make it pretty obvious that it was trained on synthetic data from ChatGPT, or a model that itself was trained on ChatGPT, and that will naturally introduce some alignment.
replies(2): >>45069548 #>>45069758 #
4. mapontosevenths ◴[] No.45069548{3}[source]
Well...To be completely accurate it's better to say that it actually IS aligned, it's just aligned to be neutral and steerable.

It IS based on synthetic training data using Atropos, and I imagine some of the source model leaks in as well. Although, when using it you don't seem to see as much of that as you did in Hermes 3.

5. sebastiennight ◴[] No.45069758{3}[source]
I tried the same roleplaying prompt shared by GP in another (now deleted) comment and got a very similar completion from gpt-3.5-turbo.

(While GPT-5 politely declined to play along and politely asked if I actually needed help with anything.)

So, based on GP's own example I'd say the model is GPT-3.5 level?

6. nullc ◴[] No.45070079[source]
It is, they trained on chatgpt output. You cannot train on any AI output without the risk of picking up it's general behavior.

Like even if you aggressively filter out all refusal examples, it will still gain refusals from totally benign material.

Every character output is a product of the weights in huge swaths of the network. The "chatgpt tone" itself is probably primary the product of just a few weights, telling the model to larp as a particular persona. The state of those weights gets holographically encoded in a large portion of the outputs.

Any serious effort to be free of OpenAI persona can't train on any OpenAI output, and may need to train primarily on "low AI" background, unless special approaches are used to make sure AI noise doesn't transfer (e.g. using an entirely different architecture may work).

Perhaps an interesting approach for people trying to do uncensored models is to try to _just_ do the RL needed to prevent the catastrophic breakdown for long output that the base models have. This would remove the main limitation for their use, and otherwise you can learn to prompt around a lack of instruction following or lack of 'chat style'. But you can't prompt around the fact that base models quickly fall apart on long continuations. Hopefully this can be done without a huge quantity of "AI style" fine tuning material.

7. jrflowers ◴[] No.45070917[source]
> Every nine fingered woodworker knows that some things just can't be built with all the guards on.

I love this sentence because it is complete gibberish. I like the idea that it’s a regular thing for woodworkers to intentionally sacrifice their fingers, like they look at a cabinet that’s 90% done and go “welp, I guess I’m gonna donate my pinky to The Cause”