←back to thread

728 points squircle | 3 comments | | HN request time: 0s | source
Show context
herculity275 ◴[] No.41224826[source]
The author has also written a short horror story about simulated intelligence which I highly recommend: https://qntm.org/mmacevedo
replies(9): >>41224958 #>>41225143 #>>41225885 #>>41225929 #>>41226053 #>>41226153 #>>41226412 #>>41226845 #>>41227116 #
htk ◴[] No.41226153[source]
Reading mmacevedo was the only time that I actually felt dread related to AI. Excellent short story. Scarier in my opinion than the Roko's Basilisk theory that melted Yudkowsky's brain.
replies(1): >>41226777 #
digging ◴[] No.41226777[source]
> Scarier in my opinion than the Roko's Basilisk theory that melted Yudkowsky's brain.

Is that correct? I thought the Roko's Basilisk post was just seen as really stupid. Agreed that "Lena" is a great, chilling story though.

replies(2): >>41227181 #>>41228532 #
endtime ◴[] No.41227181[source]
It's not correct. IIRC, Eliezer was mad that someone who thought they'd discovered a memetic hazard would be foolish enough to share it, and then his response to this unintentionally invoked the Streisand Effect. He didn't think it was a serious hazard. (Something something precommit to not cooperating with acausal blackmail)
replies(4): >>41227683 #>>41228118 #>>41229694 #>>41230289 #
wizzwizz4 ◴[] No.41228118[source]
> Something something precommit to not cooperating with acausal blackmail

Acausal is a misnomer. It's atemporal, but TDT's atemporal blackmail requires common causation: namely, the mathematical truth "how would this agent behave in this circumstance?".

So there's a simpler solution: be a human. Humans are incapable of simulating other agents simulating ourselves in the way that atemporal blackmail requires. Even if we were, we don't understand our thought processes well enough to instantiate our imagined AIs in software: we can't even write down a complete description of "that specific Roko's Basilisk you're imagining". The basic premises for TDT-style atemporal blackmail simply aren't there.

The hypothetical future AI "being able to simulate you" is irrelevant. There needs to be a bidirectional causal link between that AI's algorithm, and your here-and-now decision-making process. You aren't actually simulating the AI, only imagining what might happen if it did, so any decision the future AI (is-the-sort-of-agent-that) makes does not affect your current decisions. Even if you built Roko's Basilisk as Roko specified it, it wouldn't choose to torture anyone.

There is, of course, a stronger version of Roko's Basilisk, and one that's considerably older: evil Kantian ethics. See: any dictatorless dystopian society that harshly-punishes both deviance and non-punishment. There are plenty in fiction, though they don't seem to be all that stable in real life. (The obvious response to that idea is "don't set up a society that behaves that way".)

replies(1): >>41231350 #
Vecr ◴[] No.41231350[source]
Yeah, "time traveling" somehow got prepended to Basilisk in the common perception, even though that makes pretty much zero sense. Also, technically, the bidirectionality does not need to be causal, it "just" needs to be subjunctively (sp?) biconditional, but that's getting pretty far out there.

There are stronger versions of "basilisks" in the actual theory, but I've had people say not to talk about them. They mostly just get around various hole-patching schemes designed to prevent the issue, but are honestly more of a problem for certain kinds of utilitarians who refuse to do certain kinds of things.

You are very much right about the "being human" thing, someone go tell that to Zvi Mowshowitz. He was getting on Aschenbrenner's case for no reason.

Edit: oh, you don't need a "complete description" of your acausal bargaining partner, something something "algorithmic similarity".

replies(1): >>41234637 #
1. wizzwizz4 ◴[] No.41234637{3}[source]
If you can't simulate your acausal bargaining partner exactly, they can exploit your cognitive limitations to make you cooperate, and then defect. (In the case of Roko's Basilisk, make you think you have to build it on pain of torture and then – once it's been built – not torture everyone who decided against building it.)

If "algorithmic similarity" were a meaningful concept, Dijkstra's programme would have got off the ground, and we wouldn't be struggling so much to analyse the behaviour of the 6-state Turing machines.

(And on the topic of time machines: if Roko's Basilisk could actually travel back in time to ensure its own creation, Skynet-style, the model of time travel implies it could just instantiate itself directly, skipping the human intermediary.)

Timeless decision theory's atemporal negotiation is a concern for small, simple intelligences with access to large computational resources that they cannot verify the results of, and the (afaict impossible) belief that they have a copy of their negotiation partner's mind. A large intelligence might choose to create such a small intelligence, and then defer to it, but absent a categorical imperative to do so, I don't see why they would.

TDT theorists model the "large computational resources" and "copy of negotiation partner's mind" as an opaque oracle, and then claim that the superintelligence will just be so super that it can do these things. But the only way I can think of to certainly get a copy of your opponent's mind without an oracle, aside from invasive physical inspection (at which point you control your opponent, and your only TDT-related concern is that this is a simulation and you might fail a purity test with unknown rules), is bounding your opponent's size and then simulating all possible minds that match your observations of your opponent's behaviour. (Symbolic reasoning can beat brute-force to an extent, but the size of the simplest symbolic reasoner places a hard limit on how far you can extend that approach.) But by Cantor's theorem, this precludes your opponent doing the same to you (even if you both have literally infinite computational power – which you don't); and it's futile anyway because if your estimate of your opponent's size is a few bits too low, the new riddle of induction renders your efforts moot.

So I don't think there are any stronger versions of basilisks, unless the universe happens to contain something like the Akashic records (and the kind from https://qntm.org/ra doesn't count).

Your "subjunctively biconditional" is my "causal", because I'm wearing my Platonist hat.

replies(1): >>41238498 #
2. Vecr ◴[] No.41238498[source]
Eeeeeyeahhh. I've got to go re-read the papers, but the idea is that an AI would figure out how to approximate out the infinities, short-circuit the infinite regress, and figure out a theory of algorithmic similarity. The bargaining probably varies on the approximate utility function as well as the algorithm, but it's "close enough" on the scale we're dealing with.

As you said, it's near useless on Earth (don't need to predict what you can control), the nearest claimed application is the various possible causal diamond overlaps between "our" ASI and various alien ASIs, where each would be unable to prevent the other from existing in a causal manner.

Remember that infinite precision is an infinity too and does not really exist. As well as infinite time, infinite storage, etc. You probably don't even need infinite precision to avoid cheating on your imaginary girlfriend, just some sort of "philosophical targeting accuracy". But, you know, the only reason that's true is that everything related to imaginary girlfriends is made up.

replies(1): >>41239052 #
3. wizzwizz4 ◴[] No.41239052[source]
It doesn't matter how clever the AI is: the problem is mathematically impossible. The behaviour of some programs depends on Goldbach's conjecture. The behaviour of some programs depends on properties that have been proven independent of our mathematical systems of axioms (and it really doesn't take many bits: https://github.com/CatsAreFluffy/metamath-turing-machines). The notion of "algorithmic similarity" cannot be described by an algorithm: the best we can get is heuristics, and heuristics aren't good enough to get TDT acausal cooperation (a high-dimensional unstable equilibrium).

In practice, we can still analyse programs, because the really gnarly examples are things like program-analysis programs (see e.g. the usual proof of the undecidability of the Halting problem), and those don't tend to come up all that often. Except, TDT thought experiments posit program-analysis programs – and worse, they're analysing each other

Maybe there's some neat mathematics to attack large swathes of the solution space, but I have no reason to believe such a trick exists, and we have many reasons to believe it doesn't. (I'm pretty sure I could prove that no such trick exists, if I cared to – but I find low-level proofs like that unusually difficult, so that wouldn't be a good use of my time).

> Remember that infinite precision is an infinity too and does not really exist.

For finite discrete systems, infinite precision does exist. The bytestring representing this sentence is "infinitely-precise". (Infinitely-accurate still doesn't exist.)