What technical problems is MIRI working on?
“Aligning smarter-than-human AI with human interests” is a vague goal. To approach this problem productively, MIRI attempts to factorize it into several subproblems. As a starting point, they ask: “What aspects of this problem would we still be unable to solve even if the problem were much easier?”
In order to achieve real-world goals more effectively than a human, an artificial general intelligence
Highly reliable agent A system that can be understood as taking actions towards achieving a goal.
The formal agent AIXI is an attempt to define what we mean by “optimal behavior” in the case of a reinforcement learner. A simple AIXI-like equation is lacking, however, for defining what we mean by “good behavior” if the goal is to change something about the external world (and not just to maximize a pre-specified reward number). In order for the agent to evaluate its world-models to count the number of diamonds, as opposed to having a privileged reward channel, what general formal properties must its world-models possess? If the system updates its hypotheses (e.g., discovers that string theory is true and quantum physics is false) in a way its programmers didn’t expect, how does it identify “diamonds” in the new model? The question is a very basic one, yet the relevant theory is currently missing.
We can distinguish the challenge of highly reliable agent design from the problem of value specification: “Once we understand how to design an autonomous AI system that promotes a goal, how do we ensure its goal actually matches what we want?” Since human error is inevitable and we will need to be able to safely supervise and redesign AI algorithms even as they approach human equivalence in cognitive tasks, MIRI also works on formalizing error-tolerant agent properties. Artificial Intelligence: A Modern Approach, the standard textbook in AI, summarizes the challenge:
Yudkowsky
Co-founder of MIRI, known for his early pioneering work in AI alignment and his predictions that AI will probably cause human extinction.
Computer science professor at UC Berkeley, founder of CHAI, and co-author of the textbook Artificial Intelligence: A Modern Approach.
MIRI’s technical agenda describes these open problems in more detail, and their research guide collects online resources for learning more.