Memoid's question on Empowerment
For the button that blow up the moon scenario: could you have a proof of work like in bitcoin to arm it? The button could have some random constant data attached, then the ai has to try adding a nonce and hashing until it finds a hash meeting certain criteria. The more impact the button has, the higher the difficulty can be set. Then maybe add a backdoor for humans with a private key.
OriginWhere was this question originally asked
|YouTube (comment link)|
|On video:||Empowerment: Concrete Problems in AI Safety part 2|
|Asked on Discord?||Yes|