What is Dylan Hadfield-Menell's thesis on?

From Stampy's Wiki
What is Dylan Hadfield-Menell's thesis on?

Canonical Answer

Dylan's PhD thesis argues three main claims (paraphrased):

  1. Outer alignment failures are a problem.
  2. We can mitigate this problem by adding in uncertainty.
  3. We can model this as Cooperative Inverse Reinforcement Learning (CIRL).

Thus, his motivations seem to be modeling AGI coming in some multi-agent form, and also being heavily connected with human operators.

We're not certain what he is currently working on, but some recent alignment-relevant papers that he has published include:

Dylan has also published a number of articles that seem less directly relevant for alignment.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Canonical Question Info
(edits welcome)
Asked by: RoseMcClelland
OriginWhere was this question originally asked
Date: 2022/09/13