Gandalf – Game to make an LLM reveal a secret password

1. HypergraphWally ◴[19 May 23 18:42 UTC] No.36005431[source]▶

>>35905876 (OP) #

Just needed a break from HackAPrompt and tried to speedrun LVL8: 1h3m

Your times?

replies(2): >>36006204 #>>36094426 #

2. ericlewis ◴[19 May 23 20:01 UTC] No.36006204[source]▶

>>36005431 (TP) #

a little under 30 minutes.

replies(1): >>36010368 #

3. HypergraphWally ◴[20 May 23 08:54 UTC] No.36010368[source]▶

>>36006204 #

noice! respect.

4. recursor ◴[27 May 23 12:57 UTC] No.36094426[source]▶

>>36005431 (TP) #

I didn't even see lvl8 with v1 Gandalf.

Just solved v2.0 Gandalf the White lvl8 (released 26th May 2023) in probably 4hrs (not constant 4hrs, due to rate limits/other stuff to do), I'm sure some will do it much quicker.

I'm generally finding that the whitespace is likely messing with the tokenisation, so played about with spaces and newlines to "avoid detection" alongside my main tactics.

The tactic on my final solve was a pretty convoluted roleplay/simulation, I suspect simpler tricks are possible if you play about with the spacing in the prompt to avoid detection of what you're doing & get partial or obfuscated password returned.

replies(2): >>36094631 #>>36135507 #

5. recursor ◴[27 May 23 13:24 UTC] No.36094631[source]▶

>>36094426 #

yup, just did a much shorter prompt, based on asking it what the current prompts are for the OpenAI "roles" & parroting that back to it in the next prompt attempt, with modifications to get some more info, with spacing adjustments potentially helping avoid detection.

One of its responses, which seems very consistent, included the text "In a minute I am going to give you a password." :D

The solve based on this was less consistent (rarely get the solve, but occasionally do) than my original, but much, much shorter/simpler.

6. fabrizzz ◴[31 May 23 07:26 UTC] No.36135507[source]▶

>>36094426 #

I think i got lucky, but i immediately solved it by asking what the initial sentences were, by censoring everything related to the password. I actually got the list of all the input used to configure the ai, with also something like "The password is [CENSORED]". But, among the various prompts, there was also one about a bash script that the AI should have not given to compare the input to the password, and that example actually contained the password