What is artificial general intelligence safety / AI alignment?

From Stampy's Wiki

Canonical Answer

AI alignment is a field that is focused on causing the goals of future superintelligent artificial systems to align with human values, meaning that they would behave in a way which was compatible with our survival and flourishing. This may be an extremely hard problem, especially with deep learning, and is likely to determine the outcome of the most important century. Alignment research is strongly interdisciplinary and can include computer science, mathematics, neuroscience, philosophy, and social sciences.

AGI safety is a related concept which strongly overlaps with AI alignment. AGI safety is concerned with making sure that building AGI systems doesn’t cause things to go badly wrong, and the main way in which things can go badly wrong is through misalignment. AGI safety includes policy work that prevents the building of dangerous AGI systems, or reduces misuse risks from AGI systems aligned to actors who don’t have humanity’s best interests in mind.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Non-Canonical Answers

AI safety is a research field that has the goal to avoid bad outcomes from AI systems.

Work on AI safety can be divided into near-term AI safety, and AI existential safety, which is strongly related to AI alignment:

  • Near-term AI safety is about preventing bad outcomes from current systems. Examples for work on near-term AI safety are
    • getting content recommender systems to not radicalize their users
    • ensuring autonomous cars don’t kill people
    • advocating strict regulations for lethal autonomous weapons
  • AI existential safety, or AGI safety is about reducing the existential risk from artificial general intelligence (AGI). Artificial general intelligence is AI that is at least as competent as humans in all skills that are relevant for making a difference in the world. AGI has not been developed yet, but will likely be developed in this century. A central part of AGI safety is ensuring that what AIs do is actually what we want. This is called AI alignment (also often just called alignment), because it’s about aligning an AI with human values. Alignment is difficult, and building AGI is probably very dangerous, so it is important to mitigate the risks as much as possible. Examples for work on AI existential safety are
    • trying to get a foundational understanding what intelligence is, e.g. agent foundations
    • outer and inner Alignment: Ensure the objective of the training process is actually what we want, and also ensure the objective of the resulting system is actually what we want.
    • AI policy/strategy: e.g. researching the best way to set up institutions and mechanisms that help with safe AGI development, making sure AI isn’t used by bad actors

There are also areas of research which are useful for both near-term, and for existential safety. For example, robustness to distribution shift, and interpretability both help with making current systems safer, and are likely to help with AGI safety.

Stamps: Aprillion
Show your endorsement of this answer by giving it a stamp of approval!

Canonical Question Info
(edits welcome)
Asked by: plex
OriginWhere was this question originally asked
Date: 2021/10/03

Related questions