←back to thread

277 points gk1 | 1 comments | | HN request time: 0.432s | source
Show context
deepdarkforest ◴[] No.44398967[source]
What irks me about anthropic blog posts, is that they are vague about details that are important to be able to (publicly) draw any conclusions they want to fit their narrative.

For example, I do not see the full system prompt anywhere, only an excerpt. But most importantly, they try to draw conclusions about the hallucinations in a weird vague way, but not once do they post an example of the notetaking/memory tool state, which obviously would be the only source of the spiralling other than the SP. And then they talk about the need of better tools etc. No, it's all about context. The whole experiment is fun, but terribly ran and analyzed. Of course they know this, but it's cooler to treat claudius or whatever as a cute human, to push the narrative of getting closer to AGI etc. Saying additional scaffolding is needed a bit is a massive understatement. Context is the whole game. That's like if a robotics company says "well, our experiment with a robot picking a tennis ball of the ground went very wrong and the ball is now radioactive, but with a bit of additional training and scaffolding, we expect it to compete in Wimbledon by mid 2026"

Similar to their "claude 4 opus blackmailing" post, they intentionally hid a bit the full system prompt, which had clear instructions to bypass any ethical guidelines etc and do whatever it can to win. Of course then the model, given the information immediately afterwards would try to blackmail. You literally told it so. The goal of this would to go to congress [1] and demand more regulations, specifically mentioning this blackmail "result". Same stuff that Sam is trying to pull, which would benefit the closed sourced leaders ofc and so on.

[1]https://old.reddit.com/r/singularity/comments/1ll3m7j/anthro...

replies(4): >>44399454 #>>44399954 #>>44400303 #>>44401076 #
1. ttcbj ◴[] No.44399954[source]
I read your comment before reading the article, and I disagree. Maybe it is because I am less actively involved in AI development, but I thought it was an interesting experiment, and documented with an appropriate level of detail.

The section on the identity crisis was particularly interesting.

Mainly, it left me with more questions. In particular, I would have been really interested to experiment with having a trusted human in the loop to provide feedback and monitor progress. Realistically, it seems like these systems would be grown that way.

I once read an article about a guy who had purchased a subway franchise, and one of the big conclusions was that running a subway franchise was _boring_. So, I could see someone being eager to delegate the boring tasks of daily business management to an AI at a simple business.