What is AI safety?

3 min read

Suggest changes in Google Docs

"AI safety" refers to efforts to prevent artificial intelligence from causing harm.¹ This site focuses on possible harms from powerful future AI systems pursuing human-unfriendly goals, because such systems could pose an existential risk to humanity.

AI safety is closely related to AI alignment, which refers to ensuring that AI systems pursue the goals we want them to.²

Work in AI safety includes:

AI alignment: Technical and conceptual research focused on getting AI systems to do what we want them to do.
AI policy and governance: Setting up institutions and mechanisms that cause major actors (such as AI labs and national governments) to implement good AI safety practices.
AI strategy and forecasting: Building models of how AI will develop and how our actions can make it go better.
Supporting efforts: Setting up systems and resources to support the above, like outreach, building and supporting communities, and education.

The terms “AI safety” and “AI risk” are mostly used in the context of existential risk. "AI safety" is sometimes also used more broadly to include work on reducing harms from current AI systems. While people sometimes use “AI safety” and “AI alignment” interchangeably to refer to the general set of problems around smarter-than-human AI, they occasionally use “AI existential safety” to make it clear that they mean risks to all of human civilization, or “AGI safety” to make it clear that they mean risks from future generally intelligent systems. ↩︎
While we expect misalignment to be the greatest obstacle to AI safety, alignment and safety are conceptually distinct. A misaligned system can be safe if it’s not particularly capable, if we manage to contain it, if it’s never used anywhere it could be dangerous, or if it’s never built at all. Conversely, an AI system used by terrorists to design bioweapons would be highly unsafe despite being aligned with its users. ↩︎