abramdemski and Scott Garrabrant's post on decision theory provides a good overview of many aspects of the topic, while Functional Decision Theory: A New Theory of Instrumental Rationality seems to be the most up to date source on current thinking.
For a more intuitive dive into one of the core problems, Newcomb's problem and regret of rationality is good, and Newcomblike problems are the norm is useful for seeing how it applies in the real world.
The LessWrong tag for decision theory has lots of additional links for people who want to explore further.
Here is a collection of helpful resources to link to while answering questions!
- Rob's YouTube videos (Computerphile appearances)
- AI Safety Papers database - Search and interface for the TAI Safety Bibliography
- Alignment Forum tags
- The Alignment Newsletter (and database sheet)
- Chapters of Bostrom's Superintelligence online
- AI Alignment pages on Arbital
- Much more on AI Safety Support (feel free to integrate useful things from there to here)
- Vika's resources list
- AGI Safety Fundamentals Course
- Advice for AI Alignment Researchers
- FHI's x-risk FAQ
- Scott Alexander's Superintelligence FAQ
Imported FAQs (permission granted to use):
The defining book is likely Nick Bostrom's Superintelligence. It gives an excellent overview of the state of the field in 2014 and makes a strong case for the subject being important.
There's also Human Compatible by Stuart Russell, which gives a more up-to-date review of developments, with an emphasis on the approaches that the Center for Human Compatible AI are working on. There's a good review/summary on SlateStarCodex.
The Alignment Problem by Brian Christian has more of an emphasis on near future problems with AI than Superintelligence or Human Compatible, but covers a good deal of current research.
Though not limited to AI Safety, Rationality: A-Z covers a lot of skills which are valuable to acquire for people trying to think about large and complex issues.
Various other books are explore the issues in an informed way, such as The Precipice, Life 3.0, and Homo Deus.
Have you read 'Superintelligence' by Nick Bostrom? What is you opinion on the book? (I just finished it)