What should I read to learn about decision theory?

abramdemski and Scott Garrabrant's post on decision theory provides a good overview of many aspects of the topic, while Functional Decision Theory: A New Theory of Instrumental Rationality seems to be the most up to date source on current thinking.

For a more intuitive dive into one of the core problems, Newcomb's problem and regret of rationality is good, and Newcomblike problems are the norm is useful for seeing how it applies in the real world.

The LessWrong tag for decision theory has lots of additional links for people who want to explore further.

Where can I learn about AI alignment?

If you like interactive FAQs, you're in the right place already! Joking aside, some great entry points are the AI alignment playlist on YouTube, “The Road to Superintelligence” and “Our Immortality or Extinction” posts on WaitBuyWhy for a fun, accessible introduction, and Vox'sThe case for taking AI seriously as a threat to humanity” as a high-quality mainstream explainer piece.

The free online Cambridge course on AGI Safety Fundamentals provides a strong grounding in much of the field and a cohort + mentor to learn with. There's even an anki deck for people who like spaced repetition!

There are many resources in this post on Levelling Up in AI Safety Research Engineering with a list of other guides at the bottom. There is also a twitter thread here with some programs for upskilling and some for safety-specific learning.

The Alignment Newsletter (podcast), Alignment Forum, and AGI Control Problem Subreddit are great for keeping up with latest developments.

I’d like to get deeper into the AI alignment literature. Where should I look?

The AGI Safety Fundamentals Course is a arguably the best way to get up to speed on alignment, you can sign up to go through it with many other people studying and mentorship or read their materials independently.

Other great ways to explore include:

You might also want to consider reading Rationality: A-Z which covers a lot of skills that are valuable to acquire for people trying to think about large and complex issues, with The Rationalist's Guide to the Galaxy available as a shorter and more accessible AI-focused option.

What are some good resources on AI alignment?

What are some good books about AGI safety?

The Alignment Problem (2020) by Brian Christian is the most recent in-depth guide to the field.

The book which first made the case to the public is Nick Bostrom’s Superintelligence (2014). It gives an excellent overview of the state of the field (as it was then) and makes a strong case for the subject being important, as well as exploring many fascinating adjacent topics. However, it does not cover newer developments, such as mesa-optimizers or language models.

There's also Human Compatible (2019) by Stuart Russell, which gives a more up-to-date review of developments, with an emphasis on the approaches that the Center for Human-Compatible AI are working on, such as cooperative inverse reinforcement learning. There's a good review/summary on SlateStarCodex.

Although not limited to AI safety, The AI Does Not Hate You (2020) is an entertaining and accessible outline of both the core issues and an exploration of some of the community and culture of the people working on it.

Various other books explore the issues in an informed way, such as Toby Ord’s The Precipice (2020), Max Tegmark’s Life 3.0 (2017), Yuval Noah Harari’s Homo Deus (2016), Stuart Armstrong’s Smarter Than Us (2014), and Luke Muehlhauser’s Facing the Intelligence Explosion (2013).

