What is Dylan Hadfield-Menell's thesis on?

1 min read

Suggest changes in Google Docs

Hadfield-Menell's PhD thesis argues three main claims (paraphrased):

Outer alignment failures are a problem.
We can mitigate this problem by adding uncertainty.
We can model this as Cooperative Inverse Reinforcement Learning (CIRL).

Thus, his motivations seem to be modeling AGI coming in some multi-agent form, and also being heavily connected with human operators.

Some recent alignment-relevant papers that he has published include:

AISafety.info is a project founded by Rob Miles. The website is maintained by a global team of specialists and volunteers from various backgrounds who want to ensure that the effects of future AI are beneficial rather than catastrophic.

Get involved

Join us on Discord

Partner projects

Alignment Ecosystem Development

© AISafety.info, 2022—1970

Aisafety.info is an Ashgro Inc Project. Ashgro Inc (EIN: 88-4232889) is a 501(c)(3) Public Charity incorporated in Delaware.