What are the differences between AI safety, AI alignment, AI control, Friendly AI, AI ethics, AI existential safety, and AGI safety?

5 min read

Suggest changes in Google Docs

There are a variety of terms that mean something like "making AI go well." The distinctions between these terms are vague, but loosely speaking, the meanings are as follows:

AI safety means preventing harm from AI. This often refers to avoiding existential risks

Existential risk

A risk of human extinction or the destruction of humanity’s long-term potential.

View full definition

, which is how we use it on aisafety.info. It can also encompass smaller-scale risks, like accidents caused by self-driving cars or harmful text produced by language models. People sometimes use “AI existential safety” to refer specifically to risks at the level of human extinction.
AI alignment means getting AI to pursue the right goals — the problem of accomplishing this is known as the “alignment problem”. "AI alignment" often refers to “intent alignment”, according to which an AI is aligned if it’s trying to do what its operator wants it to do. Others use “AI alignment” for the broader problem of making powerful AI go well, but still emphasize that getting AI “on our side” is the core issue.¹
Why equate making AI go well with AI alignment? Because if we can't control a superintelligence

Superintelligence

An AI with cognitive abilities far greater than those of humans in a wide range of important domains.

View full definition

that is not on our side, then the problem of making AI safe amounts to the problem of is the same as getting it on our side.
AI ethics

AI ethics

The study of ethical principles for AI systems and their creators to follow. In practice, “AI ethics” often refers to a cluster of concerns about present systems that include algorithmic bias and transparency.

broadly refers to the project of making sure that AI systems are designed and used in ethical ways. In practice, the term is associated with concerns about the harmful societal impacts of current-day AI, such as algorithmic bias against marginalized groups, poor treatment of crowd workers used in training AI, the environmental impacts of AI, and artists losing their livelihood to generative algorithms. The overarching principles guiding this work are fairness, accountability and transparency. While there is some overlap between AI ethics and AI alignment research, AI ethics researchers have often been critical of AI safety research that focuses on existential risk at the expense of addressing current harms.
AI governance relates to institutions and norms to coordinate the development and deployment of AI. Like technical AI safety, AI governance is aimed at preventing disastrous outcomes from AI, but governance focuses on the social context instead of on the technical problems, and deals with questions like preventing misuse, implementing good safety practices, and preventing dangerously misaligned systems from being deployed.

Terms that are used less often include:

AI control (and the “control problem”) is a term that was sometimes used roughly synonymously with “AI alignment” (and the “alignment problem”), though it is less commonly used now. Some people use the term "AI control" to encompass all potential methods of preventing AI systems from behaving dangerously — including incentivizing and constraining them (“capability control”) — and use "AI alignment" only to refer to giving AI the right internal values (“motivation selection”).
Friendly AI (FAI) is a term that was used in early work by MIRI²
Then known as the Singularity Institute.
, but is no longer used. It informally referred to AI that acts benevolently toward humans — for example, pursuing “coherent extrapolated volition”, or some other specification of the values of humanity as a whole, as its highest goal.
AI notkilleveryoneism is a term Eliezer Yudkowsky

Eliezer Yudkowsky

Co-founder of MIRI, known for his early pioneering work in AI alignment and his predictions that AI will probably cause human extinction.

and others have used facetiously to refer to the project of preventing AI from exterminating humanity, out of a sense that other terms, like “AI alignment” and others listed above, tend to drift to encompass risks of smaller scope.^{3For instance, Senator Blumenthal's remark during a Senate hearing with Sam Altman

Sam Altman
Co-founder and CEO of OpenAI.

(CEO of OpenAI

OpenAI
A large AI lab, creators of ChatGPT.

View full definition

): "I think you have said, in fact… 'Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity.' You may have had in mind the effect on jobs, which is really my biggest nightmare."}

Why equate making AI go well with AI alignment? Because if we can’t control a superintelligence that is not on our side, then the problem of making AI safe amounts to the problem of is the same as getting it on our side. ↩︎
Then known as the Singularity Institute. ↩︎
For instance, Senator Blumenthal's remark during a Senate hearing with Sam Altman (CEO of OpenAI): "I think you have said, in fact… 'Development of superhuman machine intelligence is probably the greatest threat to the continued existence of humanity.' You may have had in mind the effect on jobs, which is really my biggest nightmare." ↩︎

What are the differences between AGI, transformative AI, and superintelligence?