What are accident and misuse risks?
Misuse risk includes things like autonomous weapons and invasive surveillance. These are real problems, but do not necessarily pose an existential risk on their own1.
Accident risk (the kind this website is most focused on) is that even if designed and used with the best intent, an AI might still cause harm for a few reasons, which include:
-
Reward misspecification: We might give it incomplete instructions. For example, design it to make stamps, but neglect to instruct it to avoid some class of unwanted side effects, such as harming people along the way.
-
Goal misgeneralization: We might give it a goal which is close to what we want, but when it operates in the real world — outside of its training conditions — it might act in unexpected ways. As an analogy, if we try to teach someone to not steal by punishing them when they are caught, they might think that the goal is not to be caught, rather than not to steal.
-
Instrumentally convergent goals: There are many potentially dangerous intermediate goals that an AI could find useful for whatever specific end goals it has, e.g.:
-
Deception: The AI might act as if it is aligned with our preferences, to prevent us from shutting it down or reprogramming it, until it is powerful enough to achieve its actual, hidden goal.
-
Power seeking: The AI might try to gain power in dangerous ways, since that would be useful for any goal that it has.
-
The general intuition behind many of these examples is that the AI is not harming people because it hates them. Rather, it has some goal which is indifferent to human well-being, and people might get in the way of it achieving this goal. For more examples, see Concrete Problems in AI Safety.
However, they could be part of a broader existential risk. For example, surveillance could be a part of totalitarian lock-in, and advanced autonomous weapons could have a mutually assured destruction dynamic, which poses an existential risk similar to the current risks of global thermonuclear war. ↩︎