superintelligence

From Stampy's Wiki
Superintelligence
superintelligence
Main Question: What is superintelligence? (edit question) (edit answer)
Alignment Forum Tag
Arbital Page
Wikipedia Page

Description

A Superintelligence is a being with superhuman intelligence, and a focus of the Machine Intelligence Research Institute's research. Specifically, Nick Bostrom (1997) defined it as

A Superintelligence is a being with superhuman intelligence, and a focus of the Machine Intelligence Research Institute's research. Specifically, Nick Bostrom (1997) defined it as

"An intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills."

The Machine Intelligence Research Institute is dedicated to ensuring humanity's safety and prosperity by preparing for the development of an Artificial General Intelligence with superintelligence. Given its intelligence, it is likely to be incapable of being controlled by humanity. It is important to prepare early for the development of friendly artificial intelligence, as there may be an AI arms race. A strong superintelligence is a term describing a superintelligence which is not designed with the same architecture as the human brain.

An Artificial General Intelligence will have a number of advantages aiding it in becoming a superintelligence. It can improve the hardware it runs on and obtain better hardware. It will be capable of directly editing its own code. Depending on how easy its code is to modify, it might carry out software improvements that spark further improvements. Where a task can be accomplished in a repetitive way, a module preforming the task far more efficiently might be developed. Its motivations and preferences can be edited to be more consistent with each other. It will have an indefinite life span, be capable of reproducing, and transfer knowledge, skills, and code among its copies as well as cooperating and communicating with them better than humans do with each other.

The development of superintelligence from humans is another possibility, sometimes termed a weak superintelligence. It may come in the form of whole brain emulation, where a human brain is scanned and simulated on a computer. Many of the advantages a AGI has in developing superintelligence apply here as well. The development of Brain-computer interfaces may also lead to the creation of superintelligence. Biological enhancements such as genetic engineering and the use of nootropics could lead to superintelligence as well.

Blog Posts

External Links

See Also

Canonically answered

There's the "we never figure out how to reliably instill AIs with human friendly goals" filter, which seems pretty challenging, especially with inner alignment, solving morality in a way which is possible to code up, interpretability, etc.

There's the "race dynamics mean that even though we know how to build the thing safely the first group to cross the recursive self-improvement line ends up not implementing it safely" which is potentially made worse by the twin issues of "maybe robustly aligned AIs are much harder to build" and "maybe robustly aligned AIs are much less compute efficient".

There's the "we solved the previous problems but writing perfectly reliably code in a whole new domain is hard and there is some fatal bug which we don't find until too late" filter. The paper The Pursuit of Exploitable Bugs in Machine Learning explores this.

For a much more in depth analysis, see Paul Christiano's AI Alignment Landscape talk and The Main Sources of AI Risk?.

Intelligence is powerful. Because of superior intelligence, we humans have dominated the Earth. The fate of thousands of species depends on our actions, we occupy nearly every corner of the globe, and we repurpose vast amounts of the world's resources for our own use. Artificial Superintelligence (ASI) has potential to be vastly more intelligent than us, and therefore vastly more powerful. In the same way that we have reshaped the earth to fit our goals, an ASI will find unforeseen, highly efficient ways of reshaping reality to fit its goals.

The impact that an ASI will have on our world depends on what those goals are. We have the advantage of designing those goals, but that task is not as simple as it may first seem. As described by MIRI in their Intelligence Explosion FAQ:

“A superintelligent machine will make decisions based on the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety of what humans value.”

If we do not solve the Control Problem before the first ASI is created, we may not get another chance.

Yes, if the superintelligence has goals which include humanity surviving then we would not be destroyed. If those goals are fully aligned with human well-being, we would in fact find ourselves in a dramatically better place.

What is greater-than-human intelligence?

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. But one thing that makes humans special is their general intelligence. Humans can intelligently adapt to radically new problems in the urban jungle or outer space for which evolution could not have prepared them. Humans can solve problems for which their brain hardware and software was never trained. Humans can even examine the processes that produce their own intelligence (cognitive neuroscience), and design new kinds of intelligence never seen before (artificial intelligence).

To possess greater-than-human intelligence, a machine must be able to achieve goals more effectively than humans can, in a wider range of environments than humans can. This kind of intelligence involves the capacity not just to do science and play chess, but also to manipulate the social environment.

Computer scientist Marcus Hutter has described a formal model called AIXI that he says possesses the greatest general intelligence possible. But to implement it would require more computing power than all the matter in the universe can provide. Several projects try to approximate AIXI while still being computable, for example MC-AIXI.

Still, there remains much work to be done before greater-than-human intelligence can be achieved in machines. Greater-than-human intelligence need not be achieved by directly programming a machine to be intelligent. It could also be achieved by whole brain emulation, by biological cognitive enhancement, or by brain-computer interfaces (see below).

See also:

Yes. In 2014, Google bought artificial intelligence startup DeepMind for $400 million; DeepMind added the condition that Google promise to set up an AI Ethics Board. DeepMind cofounder Shane Legg has said in interviews that he believes superintelligent AI will be “something approaching absolute power” and “the number one risk for this century”.

Many other science and technology leaders agree. Astrophysicist Stephen Hawking says that superintelligence “could spell the end of the human race.” Tech billionaire Bill Gates describes himself as “in the camp that is concerned about superintelligence…I don’t understand why some people are not concerned”. SpaceX/Tesla CEO Elon Musk calls superintelligence “our greatest existential threat” and donated $10 million from his personal fortune to study the danger. Stuart Russell, Professor of Computer Science at Berkeley and world-famous AI expert, warns of “species-ending problems” and wants his field to pivot to make superintelligence-related risks a central concern.

Professor Nick Bostrom is the director of Oxford’s Future of Humanity Institute, tasked with anticipating and preventing threats to human civilization. He has been studying the risks of artificial intelligence for twenty years. The explanations in the follow-up questions are loosely adapted from his 2014 book Superintelligence.

AI is already superhuman at some tasks, for example numerical computations, and will clearly surpass humans in others as time goes on. We don’t know when (or even if) machines will reach human-level ability in all cognitive tasks, but most of the AI researchers at FLI’s conference in Puerto Rico put the odds above 50% for this century, and many offered a significantly shorter timeline. Since the impact on humanity will be huge if it happens, it’s worthwhile to start research now on how to ensure that any impact is positive. Many researchers also believe that dealing with superintelligent AI will be qualitatively very different from more narrow AI systems, and will require very significant research effort to get right.

Won’t AI be just like us?

The degree to which an Artificial Superintelligence (ASI) would resemble us depends heavily on how it is implemented, but it seems that differences are unavoidable. If AI is accomplished through whole brain emulation and we make a big effort to make it as human as possible (including giving it a humanoid body), the AI could probably be said to think like a human. However, by definition of ASI it would be much smarter. Differences in the substrate and body might open up numerous possibilities (such as immortality, different sensors, easy self-improvement, ability to make copies, etc.). Its social experience and upbringing would likely also be entirely different. All of this can significantly change the ASI's values and outlook on the world, even if it would still use the same algorithms as we do. This is essentially the "best case scenario" for human resemblance, but whole brain emulation is kind of a separate field from AI, even if both aim to build intelligent machines. Most approaches to AI are vastly different and most ASIs would likely not have humanoid bodies. At this moment in time it seems much easier to create a machine that is intelligent than a machine that is exactly like a human (it's certainly a bigger target).

This is a big question that it would pay to start thinking about. Humans are in control of this planet not because we are stronger or faster than other animals, but because we are smarter! If we cede our position as smartest on our planet, it’s not obvious that we’ll retain control.

Why think that AI can outperform humans?

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more.1 However, human intelligence continues to dominate machine intelligence in generality.

A powerful chess computer is “narrow”: it can’t play other games. In contrast, humans have problem-solving abilities that allow us to adapt to new contexts and excel in many domains other than what the ancestral environment prepared us for.

In the absence of a formal definition of “intelligence” (and therefore of “artificial intelligence”), we can heuristically cite humans’ perceptual, inferential, and deliberative faculties (as opposed to, e.g., our physical strength or agility) and say that intelligence is “those kinds of things.” On this conception, intelligence is a bundle of distinct faculties — albeit a very important bundle that includes our capacity for science.

Our cognitive abilities stem from high-level patterns in our brains, and these patterns can be instantiated in silicon as well as carbon. This tells us that general AI is possible, though it doesn’t tell us how difficult it is. If intelligence is sufficiently difficult to understand, then we may arrive at machine intelligence by scanning and emulating human brains or by some trial-and-error process (like evolution), rather than by hand-coding a software agent.

If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond qualitative improvements in cognition, Nick Bostrom notes more straightforward advantages we could realize in digital minds, e.g.:

  • editability — “It is easier to experiment with parameter variations in software than in neural wetware.”2
  • speed — “The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.”
  • serial depth — On short timescales, machines can carry out much longer sequential processes.
  • storage capacity — Computers can plausibly have greater working and long-term memory.
  • size — Computers can be much larger than a human brain.
  • duplicability — Copying software onto new hardware can be much faster and higher-fidelity than biological reproduction.

Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.

Present-day AI algorithms already demand special safety guarantees when they must act in important domains without human oversight, particularly when they or their environment can change over time:

Achieving these gains [from autonomous systems] will depend on development of entirely new methods for enabling “trust in autonomy” through verification and validation (V&V) of the near-infinite state systems that result from high levels of [adaptability] and autonomy. In effect, the number of possible input states that such systems can be presented with is so large that not only is it impossible to test all of them directly, it is not even feasible to test more than an insignificantly small fraction of them. Development of such systems is thus inherently unverifiable by today’s methods, and as a result their operation in all but comparatively trivial applications is uncertifiable.

It is possible to develop systems having high levels of autonomy, but it is the lack of suitable V&V methods that prevents all but relatively low levels of autonomy from being certified for use.

- Office of the US Air Force Chief Scientist (2010). Technology Horizons: A Vision for Air Force Science and Technology 2010-30.

As AI capabilities improve, it will become easier to give AI systems greater autonomy, flexibility, and control; and there will be increasingly large incentives to make use of these new possibilities. The potential for AI systems to become more general, in particular, will make it difficult to establish safety guarantees: reliable regularities during testing may not always hold post-testing.

The largest and most lasting changes in human welfare have come from scientific and technological innovation — which in turn comes from our intelligence. In the long run, then, much of AI’s significance comes from its potential to automate and enhance progress in science and technology. The creation of smarter-than-human AI brings with it the basic risks and benefits of intellectual progress itself, at digital speeds.

As AI agents become more capable, it becomes more important (and more difficult) to analyze and verify their decisions and goals. Stuart Russell writes:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

  1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
  2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

Bostrom’s “The Superintelligent Will” lays out these two concerns in more detail: that we may not correctly specify our actual goals in programming smarter-than-human AI systems, and that most agents optimizing for a misspecified goal will have incentives to treat humans adversarially, as potential threats or obstacles to achieving the agent’s goal.

If the goals of human and AI agents are not well-aligned, the more knowledgeable and technologically capable agent may use force to get what it wants, as has occurred in many conflicts between human communities. Having noticed this class of concerns in advance, we have an opportunity to reduce risk from this default scenario by directing research toward aligning artificial decision-makers’ interests with our own.

What is superintelligence?

A superintelligence is a mind that is much more intelligent than any human. Most of the time, it’s used to discuss hypothetical future AIs.

Superintelligence has an advantage that an early human didn’t – the entire context of human civilization and technology, there for it to manipulate socially or technologically.

People tend to imagine AIs as being like nerdy humans – brilliant at technology but clueless about social skills. There is no reason to expect this – persuasion and manipulation is a different kind of skill from solving mathematical proofs, but it’s still a skill, and an intellect as far beyond us as we are beyond lions might be smart enough to replicate or exceed the “charming sociopaths” who can naturally win friends and followers despite a lack of normal human emotions. A superintelligence might be able to analyze human psychology deeply enough to understand the hopes and fears of everyone it negotiates with. Single humans using psychopathic social manipulation have done plenty of harm – Hitler leveraged his skill at oratory and his understanding of people’s darkest prejudices to take over a continent. Why should we expect superintelligences to do worse than humans far less skilled than they?

(More outlandishly, a superintelligence might just skip language entirely and figure out a weird pattern of buzzes and hums that causes conscious thought to seize up, and which knocks anyone who hears it into a weird hypnotizable state in which they’ll do anything the superintelligence asks. It sounds kind of silly to me, but then, nuclear weapons probably would have sounded kind of silly to lions sitting around speculating about what humans might be able to accomplish. When you’re dealing with something unbelievably more intelligent than you are, you should probably expect the unexpected.)

Non-canonical answers

What is superintelligence?

Nick Bostrom defines superintelligence as “an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills.” A chess program can outperform humans in chess, but is useless at any other task. Superintelligence will have been achieved when we create a machine that outperforms the human brain across practically any domain.

The argument goes: computers only do what we command them; no more, no less. So it might be bad if terrorists or enemy countries develop superintelligence first. But if we develop superintelligence first there’s no problem. Just command it to do the things we want, right?

Suppose we wanted a superintelligence to cure cancer. How might we specify the goal “cure cancer”? We couldn’t guide it through every individual step; if we knew every individual step, then we could cure cancer ourselves. Instead, we would have to give it a final goal of curing cancer, and trust the superintelligence to come up with intermediate actions that furthered that goal. For example, a superintelligence might decide that the first step to curing cancer was learning more about protein folding, and set up some experiments to investigate protein folding patterns.

A superintelligence would also need some level of common sense to decide which of various strategies to pursue. Suppose that investigating protein folding was very likely to cure 50% of cancers, but investigating genetic engineering was moderately likely to cure 90% of cancers. Which should the AI pursue? Presumably it would need some way to balance considerations like curing as much cancer as possible, as quickly as possible, with as high a probability of success as possible.

But a goal specified in this way would be very dangerous. Humans instinctively balance thousands of different considerations in everything they do; so far this hypothetical AI is only balancing three (least cancer, quickest results, highest probability). To a human, it would seem maniacally, even psychopathically, obsessed with cancer curing. If this were truly its goal structure, it would go wrong in almost comical ways.

If your only goal is “curing cancer”, and you lack humans’ instinct for the thousands of other important considerations, a relatively easy solution might be to hack into a nuclear base, launch all of its missiles, and kill everyone in the world. This satisfies all the AI’s goals. It reduces cancer down to zero (which is better than medicines which work only some of the time). It’s very fast (which is better than medicines which might take a long time to invent and distribute). And it has a high probability of success (medicines might or might not work; nukes definitely do).

So simple goal architectures are likely to go very wrong unless tempered by common sense and a broader understanding of what we do and do not value.

No. The superintelligence is now focused on calculating as many digits of pi as possible. Its current plan will allow it to calculate two hundred trillion such digits. But if it were turned off, or reprogrammed to do something else, that would result in it calculating zero digits. An entity fixated on calculating as many digits of pi as possible will work hard to prevent scenarios where it calculates zero digits of pi. Indeed, it will interpret such as a hostile action. Just by programming it to calculate digits of pi, we will have given it a drive to prevent people from turning it off.

University of Illinois computer scientist Steve Omohundro argues that entities with very different final goals – calculating digits of pi, curing cancer, helping promote human flourishing – will all share a few basic ground-level subgoals. First, self-preservation – no matter what your goal is, it’s less likely to be accomplished if you’re too dead to work towards it. Second, goal stability – no matter what your goal is, you’re more likely to accomplish it if you continue to hold it as your goal, instead of going off and doing something else. Third, power – no matter what your goal is, you’re more likely to be able to accomplish it if you have lots of power, rather than very little.

So just by giving a superintelligence a simple goal like “calculate digits of pi”, we’ve accidentally given it Omohundro goals like “protect yourself”, “don’t let other people reprogram you”, and “seek power”.

As long as the superintelligence is safely contained, there’s not much it can do to resist reprogramming. But it’s hard to consistently contain a hostile superintelligence.

Yes, but it might not work.

Suppose we tell a human-level AI that expects to later achieve superintelligence that it should calculate as many digits of pi as possible. It considers two strategies.

First, it could try to seize control of more computing resources now. It would likely fail, its human handlers would likely reprogram it, and then it could never calculate very many digits of pi.

Second, it could sit quietly and calculate, falsely reassuring its human handlers that it had no intention of taking over the world. Then its human handlers might allow it to achieve superintelligence, after which it could take over the world and calculate hundreds of trillions of digits of pi.

Since self-protection and goal stability are Omohundro goals, a weak AI will present itself as being as friendly to humans as possible, whether it is in fact friendly to humans or not. If it is “only” as smart as Einstein, it may be very good at manipulating humans into believing what it wants them to believe even before it is fully superintelligent.

There’s a second consideration here too: superintelligences have more options. An AI only as smart and powerful as an ordinary human really won’t have any options better than calculating the digits of pi manually. If asked to cure cancer, it won’t have any options better than the ones ordinary humans have – becoming doctors, going into pharmaceutical research. It’s only after an AI becomes superintelligent that things start getting hard to predict.

So if you tell a human-level AI to cure cancer, and it becomes a doctor and goes into cancer research, then you have three possibilities. First, you’ve programmed it well and it understands what you meant. Second, it’s genuinely focused on research now but if it becomes more powerful it would switch to destroying the world. And third, it’s trying to trick you into trusting it so that you give it more power, after which it can definitively “cure” cancer with nuclear weapons.

That is, if you know an AI is likely to be superintelligent, can’t you just disconnect it from the Internet, not give it access to any speakers that can make mysterious buzzes and hums, make sure the only people who interact with it are trained in caution, et cetera?. Isn’t there some level of security – maybe the level we use for that room in the CDC where people in containment suits hundreds of feet underground analyze the latest superviruses – with which a superintelligence could be safe?

This puts us back in the same situation as lions trying to figure out whether or not nuclear weapons are a things humans can do. But suppose there is such a level of security. You build a superintelligence, and you put it in an airtight chamber deep in a cave with no Internet connection and only carefully-trained security experts to talk to. What now?

Now you have a superintelligence which is possibly safe but definitely useless. The whole point of building superintelligences is that they’re smart enough to do useful things like cure cancer. But if you have the monks ask the superintelligence for a cancer cure, and it gives them one, that’s a clear security vulnerability. You have a superintelligence locked up in a cave with no way to influence the outside world except that you’re going to mass produce a chemical it gives you and inject it into millions of people.

Or maybe none of this happens, and the superintelligence sits inert in its cave. And then another team somewhere else invents a second superintelligence. And then a third team invents a third superintelligence. Remember, it was only about ten years between Deep Blue beating Kasparov, and everybody having Deep Blue – level chess engines on their laptops. And the first twenty teams are responsible and keep their superintelligences locked in caves with carefully-trained experts, and the twenty-first team is a little less responsible, and now we still have to deal with a rogue superintelligence.

Superintelligences are extremely dangerous, and no normal means of controlling them can entirely remove the danger.

What is superintelligence?

Nick Bostrom defined ‘superintelligence’ as:

"an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills."

This definition includes vague terms like ‘much’ and ‘practically’, but it will serve as a working definition for superintelligence in this FAQ An intelligence explosion would lead to machine superintelligence, and some believe that an intelligence explosion is the most likely path to superintelligence.

See also:

Bostrom, Long Before Superintelligence? Legg, Machine Super Intelligence

Except in the case of Whole Brain Emulation, there is no reason to expect a superintelligent machine to have motivations anything like those of humans. Human minds represent a tiny dot in the vast space of all possible mind designs, and very different kinds of minds are unlikely to share to complex motivations unique to humans and other mammals.

Whatever its goals, a superintelligence would tend to commandeer resources that can help it achieve its goals, including the energy and elements on which human life depends. It would not stop because of a concern for humans or other intelligences that is ‘built in’ to all possible mind designs. Rather, it would pursue its particular goal and give no thought to concerns that seem ‘natural’ to that particular species of primate called homo sapiens.

There are, however, some basic instrumental motivations we can expect superintelligent machines to display, because they are useful for achieving its goals, no matter what its goals are. For example, an AI will ‘want’ to self-improve, to be optimally rational, to retain its original goals, to acquire resources, and to protect itself — because all these things help it achieve the goals with which it was originally programmed.

See also:

Science fiction author Isaac Asimov told stories about robots programmed with the Three Laws of Robotics: (1) a robot may not injure a human being or, through inaction, allow a human being to come to harm, (2) a robot must obey any orders given to it by human beings, except where such orders would conflict with the First Law, and (3) a robot must protect its own existence as long as such protection does not conflict with the First or Second Law. But Asimov’s stories tended to illustrate why such rules would go wrong.

Still, could we program ‘constraints’ into a superintelligence that would keep it from harming us? Probably not.

One approach would be to implement ‘constraints’ as rules or mechanisms that prevent a machine from taking actions that it would normally take to fulfill its goals: perhaps ‘filters’ that intercept and cancel harmful actions, or ‘censors’ that detect and suppress potentially harmful plans within a superintelligence.

Constraints of this kind, no matter how elaborate, are nearly certain to fail for a simple reason: they pit human design skills against superintelligence. A superintelligence would correctly see these constraints as obstacles to the achievement of its goals, and would do everything in its power to remove or circumvent them. Perhaps it would delete the section of its source code that contains the constraint. If we were to block this by adding another constraint, it could create new machines that don’t have the constraint written into them, or fool us into removing the constraints ourselves. Further constraints may seem impenetrable to humans, but would likely be defeated by a superintelligence. Counting on humans to out-think a superintelligence is not a viable solution.

If constraints on top of goals are not feasible, could we put constraints inside of goals? If a superintelligence had a goal of avoiding harm to humans, it would not be motivated to remove this constraint, avoiding the problem we pointed out above. Unfortunately, the intuitive notion of ‘harm’ is very difficult to specify in a way that doesn’t lead to very bad results when used by a superintelligence. If ‘harm’ is defined in terms of human pain, a superintelligence could rewire humans so that they don’t feel pain. If ‘harm’ is defined in terms of thwarting human desires, it could rewire human desires. And so on.

If, instead of trying to fully specify a term like ‘harm’, we decide to explicitly list all of the actions a superintelligence ought to avoid, we run into a related problem: human value is complex and subtle, and it’s unlikely we can come up with a list of all the things we don’t want a superintelligence to do. This would be like writing a recipe for a cake that reads: “Don’t use avocados. Don’t use a toaster. Don’t use vegetables…” and so on. Such a list can never be long enough.

The last section of Bostrom’s Superintelligence is called “Philosophy With A Deadline”.

Many of the problems surrounding superintelligence are the sorts of problems philosophers have been dealing with for centuries. To what degree is meaning inherent in language, versus something that requires external context? How do we translate between the logic of formal systems and normal ambiguous human speech? Can morality be reduced to a set of ironclad rules, and if not, how do we know what it is at all?

Existing answers to these questions are enlightening but nontechnical. The theories of Aristotle, Kant, Mill, Wittgenstein, Quine, and others can help people gain insight into these questions, but are far from formal. Just as a good textbook can help an American learn Chinese, but cannot be encoded into machine language to make a Chinese-speaking computer, so the philosophies that help humans are only a starting point for the project of computers that understand us and share our values.

The new field of machine goal alignment (sometimes colloquially called “Friendly AI”) combines formal logic, mathematics, computer science, cognitive science, and philosophy in order to advance that project. Some of the most important projects in machine goal alignment include:

1. How can computers prove their own goal consistency under self-modification? That is, suppose an AI with certain values is planning to improve its own code in order to become superintelligent. Is there some test it can apply to the new design to be certain that it will keep the same goals as the old design?

2. How can computer programs prove statements about themselves at all? Programs correspond to formal systems, and formal systems have notorious difficulty proving self-reflective statements – the most famous example being Godel’s Incompleteness Theorem. There’s been some progress in this area already, with a few results showing that systems that reason probabilistically rather than requiring certainty can come arbitrarily close to self-reflective proofs.

3. How can a machine be stably reinforced? Most reinforcement strategies ask a learner to maximize the level of their own reward, but this is vulnerable to the learner discovering how to maximize the reward signal directly instead of maximizing the world-states that are translated into reward (the human equivalent is stimulating the pleasure-center of the brain with electricity or heroin instead of going out and doing pleasurable things). Are there reward structures that avoid this failure mode?

4. How can a machine be programmed to learn “human values”? Granted that one has an AI smart enough to be able to learn human values if you told it to do so, how do you specify exactly what “human values” are so that the machine knows what it is that it should be learning, distinct from “human preferences” or “human commands” or “the value of that one human over there”?

This is the philosophy; the other half of Bostrom’s formulation is the deadline. Traditional philosophy has been going on almost three thousand years; machine goal alignment has until the advent of superintelligence, a nebulous event which may be anywhere from a decades to centuries away. If the control problem doesn’t get adequately addressed by then, we are likely to see poorly controlled superintelligences that are unintentionally hostile to the human race, with some of the catastrophic outcomes mentioned above. This is why so many scientists and entrepreneurs are urging quick action on getting machine goal alignment research up to an adequate level. If it turns out that superintelligence is centuries away and such research is premature, little will have been lost. But if our projections were too optimistic, and superintelligence is imminent, then doing such research now rather than later becomes vital.

Currently three organizations are doing such research full-time: the Future of Humanity Institute at Oxford, the Future of Life Institute at MIT, and the Machine Intelligence Research Institute in Berkeley. Other groups are helping and following the field, and some corporations like Google are also getting involved. Still, the field remains tiny, with only a few dozen researchers and a few million dollars in funding. Efforts like Superintelligence are attempts to get more people to pay attention and help the field grow.

If you’re interested about learning more, you can visit these groups’ websites at https://www.fhi.ox.ac.uk, http://futureoflife.org/, and http://intelligence.org.

Who is Professor Nick Bostrom?

Professor Nick Bostrom is the director of Oxford’s Future of Humanity Institute, tasked with anticipating and preventing threats to human civilization.

He has been studying the risks of artificial intelligence for over twenty years. In his 2014 book Superintelligence, he covers, among other things three major questions:

  • First, why is superintelligence a topic of concern
  • Second, what is a “hard takeoff” and how does it impact our concern about superintelligence?
  • Third, what measures can we take to make superintelligence safe and beneficial for humanity?

AlphaGo was connected to the Internet – why shouldn’t the first superintelligence be? This gives a sufficiently clever superintelligence the opportunity to manipulate world computer networks. For example, it might program a virus that will infect every computer in the world, causing them to fill their empty memory with partial copies of the superintelligence, which when networked together become full copies of the superintelligence. Now the superintelligence controls every computer in the world, including the ones that target nuclear weapons. At this point it can force humans to bargain with it, and part of that bargain might be enough resources to establish its own industrial base, and then we’re in humans vs. lions territory again.

(Satoshi Nakamoto is a mysterious individual who posted a design for the Bitcoin currency system to a cryptography forum. The design was so brilliant that everyone started using it, and Nakamoto – who had made sure to accumulate his own store of the currency before releasing it to the public – became a multibillionaire. In other words, somebody with no resources except the ability to make one post to an Internet forum managed to leverage that into a multibillion dollar fortune – and he wasn’t even superintelligent. If Hitler is a lower-bound on how bad superintelligent persuaders can be, Nakamoto should be a lower-bound on how bad superintelligent programmers with Internet access can be.)