←back to thread

Alignment is capability

(www.off-policy.com)
106 points drctnlly_crrct | 2 comments | | HN request time: 0.592s | source
Show context
xnorswap ◴[] No.46192597[source]
I've only been using it a couple of weeks, but in my opinion, Opus 4.5 is the biggest jump in tech we've seen since ChatGPT 3.5.

The difference between juggling Sonnet 4.5 / Haiku 4.5 and just using Opus 4.5 for everything is night & day.

Unlike Sonnet 4.5 which merely had promise at being able to go off and complete complex tasks, Opus 4.5 seems genuinely capable of doing so.

Sonnet needed hand-holding and correction at almost every step. Opus just needs correction and steering at an early stage, and sometimes will push back and correct my understanding of what's happening.

It's astonished me with it's capability to produce easy to read PDFs via Typst, and has produced large documents outlining how to approach very tricky tech migration tasks.

Sonnet would get there eventually, but not without a few rounds of dealing with compilation errors or hallucinated data. Opus seems to like to do "And let me just check my assumptions" searches which makes all the difference.

replies(5): >>46192783 #>>46192922 #>>46193718 #>>46194371 #>>46196267 #
1. boxed ◴[] No.46192922[source]
I had a situation this weekend where Claude said "x does not make sense in [context]" and didn't do the change I asked it to do. After an explanation of the purpose of the code, it fixed the issue and continued. Pretty cool.

(Of course, I'm still cognizant of the fact that it's just a bucket of numbers but still)

replies(1): >>46192975 #
2. sd9 ◴[] No.46192975[source]
My kingdom for an LLM that tells me I’m wrong