Answer to What is Coherent Extrapolated Volition?
Eliezer Yudkowsky has proposed Coherent Extrapolated Volition as a solution to at least two problems facing Friendly AI design:
- The fragility of human values: Yudkowsky writes that “any future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals will contain almost nothing of worth.” The problem is that what humans value is complex and subtle, and difficult to specify. Consider the seemingly minor value of novelty. If a human-like value of novelty is not programmed into a superintelligent machine, it might explore the universe for valuable things up to a certain point, and then maximize the most valuable thing it finds (the exploration-exploitation tradeoff) — tiling the solar system with brains in vats wired into happiness machines, for example. When a superintelligence is in charge, you have to get its motivational system exactly right in order to not make the future undesirable.
- The locality of human values: Imagine if the Friendly AI problem had faced the ancient Greeks, and they had programmed it with the most progressive moral values of their time. That would have led the world to a rather horrifying fate. But why should we think that humans have, in the 21st century, arrived at the apex of human morality? We can’t risk programming a superintelligent machine with the moral values we happen to hold today. But then, which moral values do we give it?
Yudkowsky suggests that we build a ‘seed AI’ to discover and then extrapolate the ‘coherent extrapolated volition’ of humanity:
> In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.
The seed AI would use the results of this examination and extrapolation of human values to program the motivational system of the superintelligence that would determine the fate of the galaxy.
However, some worry that the collective will of humanity won’t converge on a coherent set of goals. Others believe that guaranteed Friendliness is not possible, even by such elaborate and careful means.
- Yudkowsky, Coherent Extrapolated Volition
|Original by:||Luke Muehlhauser (edits by plex)|
|Based on:||MIRI's Intelligence Explosion FAQ|