From Stampy's Wiki
Main Question: What are some important terms in AI alignment? (edit question) (write answer)
Alignment Forum Tag
Wikipedia Page


Posts that attempt to Define or clarify the meaning of a concept, a word, phrase, or something else.

Posts that attempt to Define or clarify the meaning of a concept, a word, phrase, or something else.

Canonically answered

What is greater-than-human intelligence?

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. But one thing that makes humans special is their general intelligence. Humans can intelligently adapt to radically new problems in the urban jungle or outer space for which evolution could not have prepared them. Humans can solve problems for which their brain hardware and software was never trained. Humans can even examine the processes that produce their own intelligence (cognitive neuroscience), and design new kinds of intelligence never seen before (artificial intelligence).

To possess greater-than-human intelligence, a machine must be able to achieve goals more effectively than humans can, in a wider range of environments than humans can. This kind of intelligence involves the capacity not just to do science and play chess, but also to manipulate the social environment.

Computer scientist Marcus Hutter has described a formal model called AIXI that he says possesses the greatest general intelligence possible. But to implement it would require more computing power than all the matter in the universe can provide. Several projects try to approximate AIXI while still being computable, for example MC-AIXI.

Still, there remains much work to be done before greater-than-human intelligence can be achieved in machines. Greater-than-human intelligence need not be achieved by directly programming a machine to be intelligent. It could also be achieved by whole brain emulation, by biological cognitive enhancement, or by brain-computer interfaces (see below).

See also:

How is ‘intelligence’ defined?

Artificial intelligence researcher Shane Legg defines intelligence like this:

Intelligence measures an agent’s ability to achieve goals in a wide range of environments.

This is a bit vague, but it will serve as the working definition of ‘intelligence’.

See also:

What is biological cognitive enhancement?

There may be genes or molecules that can be modified to improve general intelligence. Researchers have already done this in mice: they over-expressed the NR2B gene, which improved those mice’s memory beyond that of any other mice of any mouse species. Biological cognitive enhancement in humans may cause an intelligence explosion to occur more quickly than it otherwise would.

See also:

What are brain-computer interfaces?

A brain-computer interface (BCI) is a direct communication pathway between the brain and a computer device. BCI research is heavily funded, and has already met dozens of successes. Three successes in human BCIs are a device that restores (partial) sight to the blind, cochlear implants that restore hearing to the deaf, and a device that allows use of an artificial hand by direct thought.

Such device restore impaired functions, but many researchers expect to also augment and improve normal human abilities with BCIs. Ed Boyden is researching these opportunities as the lead of the Synthetic Neurobiology Group at MIT. Such devices might hasten the arrival of an intelligence explosion, if only by improving human intelligence so that the hard problems of AI can be solved more rapidly.

See also:

Wikipedia, Brain-computer interface

A slow takeoff is a situation in which AI goes from infrahuman to human to superhuman intelligence very gradually. For example, imagine an augmented “IQ” scale (THIS IS NOT HOW IQ ACTUALLY WORKS – JUST AN EXAMPLE) where rats weigh in at 10, chimps at 30, the village idiot at 60, average humans at 100, and Einstein at 200. And suppose that as technology advances, computers gain two points on this scale per year. So if they start out as smart as rats in 2020, they’ll be as smart as chimps in 2035, as smart as the village idiot in 2050, as smart as average humans in 2070, and as smart as Einstein in 2120. By 2190, they’ll be IQ 340, as far beyond Einstein as Einstein is beyond a village idiot.

In this scenario progress is gradual and manageable. By 2050, we will have long since noticed the trend and predicted we have 20 years until average-human-level intelligence. Once AIs reach average-human-level intelligence, we will have fifty years during which some of us are still smarter than they are, years in which we can work with them as equals, test and retest their programming, and build institutions that promote cooperation. Even though the AIs of 2190 may qualify as “superintelligent”, it will have been long-expected and there would be little point in planning now when the people of 2070 will have so many more resources to plan with.

A moderate takeoff is a situation in which AI goes from infrahuman to human to superhuman relatively quickly. For example, imagine that in 2020 AIs are much like those of today – good at a few simple games, but without clear domain-general intelligence or “common sense”. From 2020 to 2050, AIs demonstrate some academically interesting gains on specific problems, and become better at tasks like machine translation and self-driving cars, and by 2047 there are some that seem to display some vaguely human-like abilities at the level of a young child. By late 2065, they are still less intelligent than a smart human adult. By 2066, they are far smarter than Einstein.

A fast takeoff scenario is one in which computers go even faster than this, perhaps moving from infrahuman to human to superhuman in only days or weeks.

Non-canonical answers


Whole Brain Emulation (WBE) or ‘mind uploading’ is a computer emulation of all the cells and connections in a human brain. So even if the underlying principles of general intelligence prove difficult to discover, we might still emulate an entire human brain and make it run at a million times its normal speed (computer circuits communicate much faster than neurons do). Such a WBE could do more thinking in one second than a normal human can in 31 years. So this would not lead immediately to smarter-than-human intelligence, but it would lead to faster-than-human intelligence. A WBE could be backed up (leading to a kind of immortality), and it could be copied so that hundreds or millions of WBEs could work on separate problems in parallel. If WBEs are created, they may therefore be able to solve scientific problems far more rapidly than ordinary humans, accelerating further technological progress.

See also:

What is Friendly AI?

A Friendly Artificial Intelligence (Friendly AI or FAI) is an artificial intelligence that is ‘friendly’ to humanity — one that has a good rather than bad effect on humanity.

AI researchers continue to make progress with machines that make their own decisions, and there is a growing awareness that we need to design machines to act safely and ethically. This research program goes by many names: ‘machine ethics’, ‘machine morality’, ‘artificial morality’, ‘computational ethics’ and ‘computational metaethics’, ‘friendly AI’, and ‘robo-ethics’ or ‘robot ethics’.

The most immediate concern may be in battlefield robots; the U.S. Department of Defense contracted Ronald Arkin to design a system for ensuring ethical behavior in autonomous battlefield robots. The U.S. Congress has declared that a third of America’s ground systems must be robotic by 2025, and by 2030 the U.S. Air Force plans to have swarms of bird-sized flying robots that operate semi-autonomously for weeks at a time.

But Friendly AI research is not concerned with battlefield robots or machine ethics in general. It is concerned with a problem of a much larger scale: designing AI that would remain safe and friendly after the intelligence explosion.

A machine superintelligence would be enormously powerful. Successful implementation of Friendly AI could mean the difference between a solar system of unprecedented happiness and a solar system in which all available matter has been converted into parts for achieving the superintelligence’s goals.

It must be noted that Friendly AI is a harder project than often supposed. As explored below, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence:

  1. Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.
  2. Literalness: a superintelligent machine will make decisions based on the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety[41][42][43] of what humans value. A demand like “maximize human happiness” sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.

See also:

What is Coherent Extrapolated Volition?

Eliezer Yudkowsky has proposed Coherent Extrapolated Volition as a solution to at least two problems facing Friendly AI design:

  1. The fragility of human values: Yudkowsky writes that “any future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals will contain almost nothing of worth.” The problem is that what humans value is complex and subtle, and difficult to specify. Consider the seemingly minor value of novelty. If a human-like value of novelty is not programmed into a superintelligent machine, it might explore the universe for valuable things up to a certain point, and then maximize the most valuable thing it finds (the exploration-exploitation tradeoff[58]) — tiling the solar system with brains in vats wired into happiness machines, for example. When a superintelligence is in charge, you have to get its motivational system exactly right in order to not make the future undesirable.
  2. The locality of human values: Imagine if the Friendly AI problem had faced the ancient Greeks, and they had programmed it with the most progressive moral values of their time. That would have led the world to a rather horrifying fate. But why should we think that humans have, in the 21st century, arrived at the apex of human morality? We can’t risk programming a superintelligent machine with the moral values we happen to hold today. But then, which moral values do we give it?

Yudkowsky suggests that we build a ‘seed AI’ to discover and then extrapolate the ‘coherent extrapolated volition’ of humanity:

> In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

The seed AI would use the results of this examination and extrapolation of human values to program the motivational system of the superintelligence that would determine the fate of the galaxy.

However, some worry that the collective will of humanity won’t converge on a coherent set of goals. Others believe that guaranteed Friendliness is not possible, even by such elaborate and careful means.

Unanswered canonical questions

What is narrow AI?

What is AI alignment?

Mark as: