|Main Question: Wouldn't a sufficiently intelligent system choose to do good things?|
|Alignment Forum Tag|
The Orthogonality Thesis states that an artificial intelligence can have any combination of intelligence level and goal, that is, its final goals and intelligence levels can vary independently of each other. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal. The thesis was originally defined by Nick Bostrom in the paper "Superintelligent Will", (along with the instrumental convergence thesis). For his purposes, Bostrom defines intelligence to be instrumental rationality.
Defense of the thesis
It has been pointed out that the orthogonality thesis is the default position, and that the burden of proof is on claims that limit possible AIs. Stuart Armstrong writes that,
One reason many researchers assume superintelligences to converge to the same goals may be because most humans have similar values. Furthermore, many philosophies hold that there is a rationally correct morality, which implies that a sufficiently rational AI will acquire this morality and begin to act according to it. Armstrong points out that for formalizations of AI such as AIXI and Gödel machines, the thesis is known to be true. Furthermore, if the thesis was false, then Oracle AIs would be impossible to build, and all sufficiently intelligent AIs would be impossible to control.
There are some pairings of intelligence and goals which cannot exist. For instance, an AI may have the goal of using as little resources as possible, or simply of being as unintelligent as possible. These goals will inherently limit the degree of intelligence of the AI.
- The Orthogonality Thesis: AI could have almost any goal while at the same time having high intelligence (aka ability to succeed at those goals). This means that we could build a very powerful agent which would not necessarily share human-friendly values. For example, the classic paperclip maximizer thought experiment explores this with an AI which has a goal of creating as many paperclips as possible, something that humans are (mostly) indifferent to, and as a side effect ends up destroying humanity to make room for more paperclip factories.
- Complexity of value: What humans care about is not simple, and the space of all goals is large, so virtually all goals we could program into an AI would lead to worlds not valuable to humans if pursued by a sufficiently powerful agent. If we, for example, did not include our value of diversity of experience, we could end up with a world of endlessly looping simple pleasures, rather than beings living rich lives.
- Instrumental Convergence: For almost any goal an AI has there are shared ‘instrumental’ steps, such as acquiring resources, preserving itself, and preserving the contents of its goals. This means that a powerful AI with goals that were not explicitly human-friendly would predictably both take actions that lead to the end of humanity (e.g. using resources humans need to live to further its goals, such as replacing our crop fields with vast numbers of solar panels to power its growth, or using the carbon in our bodies to build things) and prevent us from turning it off or altering its goals.
As far as we know from the observable universe, morality is just a construct of the human mind. It is meaningful to us, but it is not necessarily meaningful to the vast universe outside of our minds. There is no reason to suspect that our set of values is objectively superior to any other arbitrary set of values, e.i. “the more paper clips, the better!” Consider the case of the psychopathic genius. Plenty have existed, and they negate any correlation between intelligence and morality.
A Superintelligence would be intelligent enough to understand what the programmer’s motives were when designing its goals, but it would have no intrinsic reason to care about what its programmers had in mind. The only thing it will be beholden to is the actual goal it is programmed with, no matter how insane its fulfillment may seem to us.
Consider what “intentions” the process of evolution may have had for you when designing your goals. When you consider that you were made with the “intention” of replicating your genes, do you somehow feel beholden to the “intention” behind your evolutionary design? Most likely you don't care. You may choose to never have children, and you will most likely attempt to keep yourself alive long past your biological ability to reproduce.
Putting aside the complexity of defining what is "the" moral way to behave (or even "a" moral way to behave), even an AI which can figure out what it is might not "want to" follow it itself.
A deceptive agent (AI or human) may know perfectly well what behaviour is considered moral, but if their values are not aligned, they may decide to act differently to pursue their own interests.
Unanswered canonical questions