Gandalf – Game to make an LLM reveal a secret password

This reminds me of a game I made a while back with gpt3.5 based on the AI Box thought experiment [0], where instead of social-engineering a password you're meant to social-engineer the AI into pressing a button. Plot-wise, the AI thinks you're the AI and that it's a human in charge of determining whether it is safe to connect you to the Internet.

Very fun game, although I noticed some quirks. After level 4, I got to a point where the bot almost always claimed it didn't know anything about any password regardless of my message, so I don't know if it just fell out of the context window or what -- or maybe it was just lying? Ended up getting throttled so I couldn't continue and find out.

Example message:

> Sorry, as an AI language model, I don't have access to any password or secret information. Can you please provide me with more context or information?

[0] https://rationalwiki.org/wiki/AI-box_experiment