What are some good books about AI safety?
Light introductions to AI safety
AI safety
A research field about how to prevent risks from advanced artificial intelligence.
View full definition
A research field about how to prevent risks from advanced artificial intelligence.
There are a number of excellent introductions to AI safety that assume no prior knowledge:
Uncontrollable (2023) by Darren McKee is the latest layman's introduction to AI x-risk
A risk of human extinction or the destruction of humanity’s long-term potential.
Human Compatible (2019) by Stuart Russell explains the problem of making powerful AI which are compatible with humans. The book discusses potential solutions with an emphasis on the approaches from Professor Russell’s lab, the Center for Human-Compatible AI. One such approach is cooperative inverse reinforcement learning
A machine learning method in which the machine gets rewards based on its actions, and is adjusted to be more likely to take actions that lead to high reward.
The Alignment Problem
An approach to AI in which, instead of designing an algorithm directly, we have the system search through possible algorithms based on how well they do on some training data.
The AI Does Not Hate You (2019) by Tom Chivers is an entertaining and accessible outline of the core ideas around AI existential risk, along with an exploration of the community and culture of AI safety researchers.
Other accessible introductions include: Toby Ord’s The Precipice (2020), Max Tegmark’s Life 3.0 (2017), Yuval Noah Harari’s Homo Deus (2016), Stuart Armstrong’s Smarter Than Us (2014), and Luke Muehlhauser’s Facing the Intelligence Explosion A hypothetical scenario where machines become more intelligent very quickly, driven by recursive self-improvement.
More involved reads
For those who want to get into the weeds of AI safety, here are books that may require a lot more time to read.
The book that first made the case to the public is Nick Bostrom’s Superintelligence (2014). It gives an excellent overview of the state of the field as it was in 2014 and makes a strong case for why AI safety is important. The arguments for AI posing an existential risk are, even now, quite influential — to the point that they could be viewed as the “classical” arguments for concern over AI. However, it was written before the dominance of deep learning, and as a result also doesn’t talk about newer developments such as large language models An AI model that takes in some text and predicts how the text is most likely to continue.
Rationality: From AI to Zombies (2015) is a compendium of essays written by Eliezer Yudkowsky, an early researcher on preventing AI x-risks. Book 3, the Machine in the Ghost, covers what an optimizer
Something that can improve some process or physical artifact so that it is fit for a certain purpose or fulfills some set of requirements.
Introduction to AI Safety, Ethics and Society (2024) is a textbook written by Dan Hendrycks, director of the Center for AI Safety. It approaches AI safety as a societal challenge, and covers the basics of modern AI, the technical challenges of AI safety, collective action problems, and the challenges of governing AI.
Novels
There are many works of fiction which illustrate AI misalignment. Mostly, these are short stories, some of which are quite detailed. There are also a few novels which place AI existential risk at their forefront.
The Crystal Trilogy (2019), written by AI safety researcher Max Harms, is set in 2039 and takes the perspective of a collective AI housed in a single body. The story focuses on the conflicts between the AI and humanity, and then amongst the AI themselves.
The Number (2022) takes the perspective of an AI whose sole goal is to make a number go up. Naturally, this involves taking over the world. The novel illustrates how competitive pressures can lead to the creation of a deceptively aligned A case where the AI acts as if it were aligned while in training, but when deployed it turns out not to be aligned.
A Fire Upon the Deep (1992), by Vernor Vinge, greatly influenced the pioneers of AI Safety. It depicts a galactic conflict between two superintelligences that is played out through biological proxies. Perhaps most influential is his depiction of superintelligences as something truly beyond human capacities to outwit or outmaneuver.1
Which is why “A Fire Upon the Deep” has a plot device which forces requiring superintelligences superhuman intelligences to stay on the edges of the galaxy. ↩︎