What is Dylan Hadfield-Menell's thesis on?
From Stampy's Wiki
What is Dylan Hadfield-Menell's thesis on?
Canonical Answer
Dylan's PhD thesis argues three main claims (paraphrased):
- Outer alignment failures are a problem.
- We can mitigate this problem by adding in uncertainty.
- We can model this as Cooperative Inverse Reinforcement Learning (CIRL).
Thus, his motivations seem to be modeling AGI coming in some multi-agent form, and also being heavily connected with human operators.
We're not certain what he is currently working on, but some recent alignment-relevant papers that he has published include:
- Work on instantiating norms into AIs to incentivize deference to humans.
- Theoretically formulating the principal-agent problem.
Dylan has also published a number of articles that seem less directly relevant for alignment.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!
Tags: None (add tags)
(edits welcome) | |
---|---|
Asked by: | RoseMcClelland () |
OriginWhere was this question originally asked |
Wiki |
Date: | 2022/09/13 |
Discussion