Once we notice that a superintelligence given a specific task is trying to take over the world, can’t we turn it off, reprogram it or otherwise correct the problem?

We would not be able to turn off or reprogram a superintelligence gone rogue by default. Once in motion the superintelligence is now focused on completing its task. Suppose that it has a goal of calculating as many digits of pi as possible. Its current plan will allow it to calculate two hundred trillion such digits. But if it were turned off, or reprogrammed to do something else, that would result in it calculating zero digits. An entity fixated on calculating as many digits of pi as possible will work hard to prevent scenarios where it calculates zero digits of pi. Just by programming it to calculate digits of pi, we would have given it a drive to prevent people from turning it off.

University of Illinois computer scientist Steve Omohundro argues that entities with very different final goals – calculating digits of pi, curing cancer, helping promote human flourishing – will all share a few basic ground-level subgoals. First, self-preservation – no matter what your goal is, it’s less likely to be accomplished if you’re too dead to work towards it. Second, goal stability – no matter what your goal is, you’re more likely to accomplish it if you continue to hold it as your goal, instead of going off and doing something else. Third, power – no matter what your goal is, you’re more likely to be able to accomplish it if you have lots of power, rather than very little. Here’s the full paper.

So just by giving a superintelligence a simple goal like “calculate digits of pi”, we would have accidentally given it convergent instrumental goals like “protect yourself”, “don’t let other people reprogram you”, and “seek power”.

As long as the superintelligence is safely contained, there’s not much it can do to resist reprogramming. But it’s hard to consistently contain a hostile superintelligence.

