coherent extrapolated volition
|Main Question: What is Coherent Extrapolated Volition? (edit question) (edit non-canonical answer)|
Coherent Extrapolated Volition was a term developed by Eliezer Yudkowsky while discussing Friendly AI development. It’s meant as an argument that it would not be sufficient to explicitly program what we think our desires and motivations are into an AI, instead, we should find a way to program it in a way that it would act in our best interests – what we want it to do and not what we tell it to.
In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI's utility function.
Often CEV is used generally to refer to what the idealized version of a person would want, separate from the context of building aligned AI's.
What is volition?
As an example of the classical concept of volition, the author develops a simple thought experiment: imagine you’re facing two boxes, A and B. One of these boxes, and only one, has a diamond in it – box B. You are now asked to make a guess, whether to chose box A or B, and you chose to open box A. It was your decision to take box A, but your volition was to choose box B, since you wanted the diamond in the first place.
Now imagine someone else – Fred – is faced with the same task and you want to help him in his decision by giving the box he chose, box A. Since you know where the diamond is, simply handling him the box isn’t helping. As such, you mentally extrapolate a volition for Fred, based on a version of him that knows where the diamond is, and imagine he actually wants box B.
Coherent Extrapolated Volition
In developing friendly AI, one acting for our best interests, we would have to take care that it would have implemented, from the beginning, a coherent extrapolated volition of humankind. In calculating CEV, an AI would predict what an idealized version of us would want, "if we knew more, thought faster, were more the people we wished we were, had grown up farther together". It would recursively iterate this prediction for humanity as a whole, and determine the desires which converge. This initial dynamic would be used to generate the AI's utility function.
The main problems with CEV include, firstly, the great difficulty of implementing such a program - “If one attempted to write an ordinary computer program using ordinary computer programming skills, the task would be a thousand lightyears beyond hopeless.” Secondly, the possibility that human values may not converge. Yudkowsky considered CEV obsolete almost immediately after its publication in 2004. He states that there's a "principled distinction between discussing CEV as an initial dynamic of Friendliness, and discussing CEV as a Nice Place to Live" and his essay was essentially conflating the two definitions.
Further Reading & References
- Coherent Extrapolated Volition by Eliezer Yudkowsky (2004)
- Coherent Extrapolated Volition: A Meta-Level Approach to Machine Ethics by Nick Tarleton (2010)
- A Short Introduction to Coherent Extrapolated Volition by Michael Anissimov
- Hacking the CEV for Fun and Profit by Wei Dai
- Two questions about CEV that worry me by Vladimir Slepnev
- Beginning resources for CEV research by Luke Muehlhauser
- Cognitive Neuroscience, Arrow's Impossibility Theorem, and Coherent Extrapolated Volition by Luke Muehlhauser
- Objections to Coherent Extrapolated Volition by Alexander Kruel
Eliezer Yudkowsky has proposed Coherent Extrapolated Volition as a solution to at least two problems facing Friendly AI design:
- The fragility of human values: Yudkowsky writes that “any future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals will contain almost nothing of worth.” The problem is that what humans value is complex and subtle, and difficult to specify. Consider the seemingly minor value of novelty. If a human-like value of novelty is not programmed into a superintelligent machine, it might explore the universe for valuable things up to a certain point, and then maximize the most valuable thing it finds (the exploration-exploitation tradeoff) — tiling the solar system with brains in vats wired into happiness machines, for example. When a superintelligence is in charge, you have to get its motivational system exactly right in order to not make the future undesirable.
- The locality of human values: Imagine if the Friendly AI problem had faced the ancient Greeks, and they had programmed it with the most progressive moral values of their time. That would have led the world to a rather horrifying fate. But why should we think that humans have, in the 21st century, arrived at the apex of human morality? We can’t risk programming a superintelligent machine with the moral values we happen to hold today. But then, which moral values do we give it?
Yudkowsky suggests that we build a ‘seed AI’ to discover and then extrapolate the ‘coherent extrapolated volition’ of humanity:
> In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
The seed AI would use the results of this examination and extrapolation of human values to program the motivational system of the superintelligence that would determine the fate of the galaxy.
However, some worry that the collective will of humanity won’t converge on a coherent set of goals. Others believe that guaranteed Friendliness is not possible, even by such elaborate and careful means.
- Yudkowsky, Coherent Extrapolated Volition