Canonical Answer

Dylan's PhD thesis argues three main claims (paraphrased):

  1. Outer alignment failures are a problem.
  2. We can mitigate this problem by adding in uncertainty.
  3. We can model this as Cooperative Inverse Reinforcement Learning (CIRL).

Thus, his motivations seem to be modeling AGI coming in some multi-agent form, and also being heavily connected with human operators.

We're not certain what he is currently working on, but some recent alignment-relevant papers that he has published include:

Dylan has also published a number of articles that seem less directly relevant for alignment.

