What is DeepMind's safety team working on?
DeepMind has both a machine learning safety team focused on near-term risks, and an alignment team working on risks from artificial general intelligence
Their work includes:
-
Engaging with recent arguments from the Machine Intelligence Research Institute
-
The alignment newsletter and podcast, which were produced by Rohin Shah.
-
Research like the Goal Misgeneralization
paper.Goal misgeneralizationView full definitionPursuing a different goal during deployment than was intended to be learned in training.
-
Geoffrey Irving’s work on debate as an alignment strategy.
-
“Discovering Agents
”, which introduces a causal definition of agents, then introduces an algorithm for finding agents from empirical data.AgentView full definitionA system that can be understood as taking actions towards achieving a goal.
See Shah's comment for more research that they are doing, including a description of some that is currently unpublished.