Wouldn't a superintelligence be smart enough to avoid misunderstanding our instructions?

2 min read

While a superintelligence would be able to figure out what humans want it to do, that alone would not cause it to "care." The problem is that an AI will follow the programming it actually has, not that which we wanted it to have. If it was instructed to eradicate cancer, a goal which it can achieve by eradicating all living things on the basis that they may develop cancer in the future, it might go ahead knowing full well that we didn’t intend that outcome. It was given a very specific command: eradicate cancer as effectively as possible. The command makes no reference to “doing this in a way humans will like”, so it doesn’t.

As an analogy: humans are smart enough to understand some of our own “programming”. For example, we know that natural selection "gave" us the urge to have sex so that we would reproduce.^[1] But we still use contraception, because evolution did not give us the urge to satisfy evolution’s “values”. We can appreciate intellectually that having sex while using contraception doesn’t carry out evolution’s "intentions", but we don’t care. Similarly, a superintelligence could know our real intentions but ignore them in favor of its programmed objective.

Evolution does not act as an agent that “decides” something, but evolution is analogous in some ways to stochastic gradient descent in that it shaped the way that we, the agents, act. The relevance of this analogy is debated by some researchers. ↩︎

Can't we just tell an AI to do what we want?

What is "Do what I mean"?

What is outer alignment?