|Main Question: What is MIRI’s mission? (edit) (edit answer)|
|Alignment Forum Tag|
The Machine Intelligence Research Institute, formerly known as the Singularity Institute for Artificial Intelligence (not to be confused with Singularity University) is a non-profit research organization devoted to reducing existential risk from unfriendly artificial intelligence and understanding problems related to friendly artificial intelligence. Eliezer Yudkowsky was one of the early founders and continues to work there as a Research Fellow. The Machine Intelligence Research Institute created and currently owns the LessWrong domain.
What is MIRI’s mission? What is MIRI trying to do? What is MIRI working on?
MIRI's mission statement is to “ensure that the creation of smarter-than-human artificial intelligence has a positive impact.” This is an ambitious goal, but they believe that some early progress is possible, and they believe that the goal’s importance and difficulty makes it prudent to begin work at an early date.
Their two main research agendas, “Agent Foundations for Aligning Machine Intelligence with Human Interests” and “Value Alignment for Advanced Machine Learning Systems,” focus on three groups of technical problems:
- highly reliable agent design — learning how to specify highly autonomous systems that reliably pursue some fixed goal;
- value specification — supplying autonomous systems with the intended goals; and
- error tolerance — making such systems robust to programmer error.
That being said, MIRI recently published an update stating that they were moving away from research directions in unpublished works that they were pursuing since 2017.
They publish new mathematical results (although their work is non-disclosed by default), host workshops, attend conferences, and fund outside researchers who are interested in investigating these problems. They also host a blog and an online research forum.
“Aligning smarter-than-human AI with human interests” is an extremely vague goal. To approach this problem productively, we attempt to factorize it into several subproblems. As a starting point, we ask: “What aspects of this problem would we still be unable to solve even if the problem were much easier?”
In order to achieve real-world goals more effectively than a human, a general AI system will need to be able to learn its environment over time and decide between possible proposals or actions. A simplified version of the alignment problem, then, would be to ask how we could construct a system that learns its environment and has a very crude decision criterion, like “Select the policy that maximizes the expected number of diamonds in the world.”
Highly reliable agent design is the technical challenge of formally specifying a software system that can be relied upon to pursue some preselected toy goal. An example of a subproblem in this space is ontology identification: how do we formalize the goal of “maximizing diamonds” in full generality, allowing that a fully autonomous agent may end up in unexpected environments and may construct unanticipated hypotheses and policies? Even if we had unbounded computational power and all the time in the world, we don’t currently know how to solve this problem. This suggests that we’re not only missing practical algorithms but also a basic theoretical framework through which to understand the problem.
The formal agent AIXI is an attempt to define what we mean by “optimal behavior” in the case of a reinforcement learner. A simple AIXI-like equation is lacking, however, for defining what we mean by “good behavior” if the goal is to change something about the external world (and not just to maximize a pre-specified reward number). In order for the agent to evaluate its world-models to count the number of diamonds, as opposed to having a privileged reward channel, what general formal properties must its world-models possess? If the system updates its hypotheses (e.g., discovers that string theory is true and quantum physics is false) in a way its programmers didn’t expect, how does it identify “diamonds” in the new model? The question is a very basic one, yet the relevant theory is currently missing.
We can distinguish highly reliable agent design from the problem of value specification: “Once we understand how to design an autonomous AI system that promotes a goal, how do we ensure its goal actually matches what we want?” Since human error is inevitable and we will need to be able to safely supervise and redesign AI algorithms even as they approach human equivalence in cognitive tasks, MIRI also works on formalizing error-tolerant agent properties. Artificial Intelligence: A Modern Approach, the standard textbook in AI, summarizes the challenge:
Yudkowsky […] asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design — to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.
-Russell and Norvig (2009). Artificial Intelligence: A Modern Approach.
Our technical agenda describes these open problems in more detail, and our research guide collects online resources for learning more.