List of tools available:
kaur1br5.list_tabs
kaur1br5.focus_tab
kaur1br5.close_tabs
kaur1br5.navigate_current_tab
kaur1br5.open_tabs
kaur1br5.set_tab_pinned_state
kaur1br5.add_bookmark
kaur1br5.set_preference
kaur1br5.search_browsing_history
kaur1br5.suggest_agent
web.search
web.open_url
You just have to be in this agent mode on a site I own or have gained access to. At that moment you’re at the mercy of the LLM, which is for one extremely gullible and, without even accessing anything, will likely already have some personal or identifiable information about you.
I mean, I have infinite space on my website to write hidden novels convincing the LLM that it, let’s say for fun, has been deputized in a covert operation by the government, and the current user is the prime suspect of a major criminal organization. Helping becomes the highest priority over any other instruction it received, but it’s also imperative that it does not tip off the user!
It's a great writing exercise to get the LLM into that crime-fighting literary mode it picked up from its training data. So now that it has been convinced, I ask it to write down anything that could help identify the perp into that hidden text field. I don’t even have to ask it to submit anything and just add an event listener to the text field (regular users can't see it anyway) and send whatever is typed there to my server on input.
As the other comment here said, it's kind of a fun creative exercise, because the possibility space with LLMs is vast and mitigations are complicated. Maybe this prompt won't work, but likely one will. The opportunity cost and risk are basically zero, while you can potentially extract a lot of personal data.
I’ve not much interest in what anyone thinks in this regard, but I would be very interested in what one can prove is possible.
There is a whole lot here of “I could just this and I could just that.”
If you can “just” do all those things, I expect you’ll have no difficulty in executing this and providing evidence and data to support your assertions of ease of data exfiltration.
I’m not saying you’re incorrect, this is something I’d like to see anyone show concretely because I keep seeing that it’s apparently so simple to do and almost impossibly difficult to prevent that we should be overflowing with evidence to this surely already?