←back to thread

Alignment is capability

(www.off-policy.com)
106 points drctnlly_crrct | 1 comments | | HN request time: 0.202s | source
Show context
xnorswap ◴[] No.46192597[source]
I've only been using it a couple of weeks, but in my opinion, Opus 4.5 is the biggest jump in tech we've seen since ChatGPT 3.5.

The difference between juggling Sonnet 4.5 / Haiku 4.5 and just using Opus 4.5 for everything is night & day.

Unlike Sonnet 4.5 which merely had promise at being able to go off and complete complex tasks, Opus 4.5 seems genuinely capable of doing so.

Sonnet needed hand-holding and correction at almost every step. Opus just needs correction and steering at an early stage, and sometimes will push back and correct my understanding of what's happening.

It's astonished me with it's capability to produce easy to read PDFs via Typst, and has produced large documents outlining how to approach very tricky tech migration tasks.

Sonnet would get there eventually, but not without a few rounds of dealing with compilation errors or hallucinated data. Opus seems to like to do "And let me just check my assumptions" searches which makes all the difference.

replies(5): >>46192783 #>>46192922 #>>46193718 #>>46194371 #>>46196267 #
1. furyofantares ◴[] No.46196267[source]
> I've only been using it a couple of weeks, but in my opinion, Opus 4.5 is the biggest jump in tech we've seen since ChatGPT 3.5.

Over Sonnet 4.5 maybe, but that's ignoring Opus 4.1 as well as Codex 5.1 Max.

In terms of capabilities, I find Opus 4.5 to be essentially identical to Codex 5.1 Max up until context starts to fill up (by which I mean 50% used) which happens much more quickly with Opus 4.5 than Codex AFAICT.

I think Codex is slower (a lot?) so it's not like it's just better, but I've found there are some tasks Opus can't do at all which Codex has no problem with, I think due to the context situation.

In any case it doesn't seem like a leap.