- Review answers - Answers in need of attention. Stamp things on that page to make them show up higher here.
- Imported FAQs - Questions and answers imported from external FAQs.
Questions and Answers
These are the answers with the most Stamps.
This is actually an active area of AI alignment research, called "Impact Measures"! It's not trivial to formalize in a way which won't predictably go wrong (entropy minimization likely leads to an AI which tries really hard to put out all the stars ASAP since they produce so much entropy, for example), but progress is being made. You can read about it on the Alignment Forum tag, or watch Rob's videos Avoiding Negative Side Effects and Avoiding Positive Side Effects
Once an AGI has access to the internet it would be very challenging to meaningfully restrict it from doing things online which it wants to. There are too many options to bypass blocks we may put in place.
It may be possible to design it so that it does not want to do dangerous things in the first place, or perhaps to set up tripwires so that we notice that it’s trying to do a dangerous thing, though that relies on it not noticing or bypassing the tripwire so should not be the only layer of security.
If the AI system was deceptively aligned (i.e. pretending to be nice until it was in control of the situation) or had been in stealth mode while getting things in place for a takeover, quite possibly within hours. We may get more warning with weaker systems, if the AGI does not feel at all threatened by us, or if a complex ecosystem of AI systems is built over time and we gradually lose control.
Paul Christiano writes a story of alignment failure which shows a relatively fast transition.
AI Safety Support offers free calls to advise people interested in a career in AI Safety, so that's a great place to start. We're working on creating a bunch of detailed information for Stampy to use, but in the meantime check out these resources:
- 80,000 Hours AI safety syllabus
- Adam Gleave's Careers in Beneficial AI Research document
- Rohin Shah's FAQ on career advice for AI alignment researchers
- AI Safety Support has lots of other good resources, such as their links page, slack, newsletter, and events calendar.
- Safety-aligned research training programs (under construction).
On the pessimistic end you find people like Eliezer Yudkowsky, who said: "I consider the present gameboard to look incredibly grim, and I don't actually see a way out through hard work alone. We can hope there's a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle; preparing for an unknown miracle probably looks like "Trying to die with more dignity on the mainline" (because if you can die with more dignity on the mainline, you are better positioned to take advantage of a miracle if it occurs)."
While at the optimistic end you have people like Ben Garfinkel who put the probability at more like 0.1-1% for AI causing an existential catastrophe in the next century, with most people lying somewhere in the middle.
In previous decades, AI research had proceeded more slowly than some experts predicted. According to experts in the field, however, this trend has reversed in the past 5 years or so. AI researchers have been repeatedly surprised by, for example, the effectiveness of new visual and speech recognition systems. AI systems can solve CAPTCHAs that were specifically devised to foil AIs, translate spoken text on-the-fly, and teach themselves how to play games they have neither seen before nor been programmed to play. Moreover, the real-world value of this effectiveness has prompted massive investment by large tech firms such as Google, Facebook, and IBM, creating a positive feedback cycle that could dramatically speed progress.
Putting aside the complexity of defining what is "the" moral way to behave (or even "a" moral way to behave), even an AI which can figure out what it is might not "want to" follow it itself.
A deceptive agent (AI or human) may know perfectly well what behaviour is considered moral, but if their values are not aligned, they may decide to act differently to pursue their own interests.
As well as pulling human written answers to AI alignment questions from Stampy's Wiki, Stampy can:
- Search for AI safety papers e.g. "stampy, what's that paper about corrigibility?"
- Search for videos e.g. "what's that video where Rob talks about mesa optimizers, stampy?"
- Calculate with Wolfram Alpha e.g. "s, what's the square root of 345?"
- Search DuckDuckGo and return snippets
- And (at least in the patron Discord) falls back to polling GPT-3 to answer uncaught questions
We could shut it down weaker systems, and this would be a useful guardrail against certain types of problem caused by narrow AI. However, once an AGI establishes itself (i.e. copies of itself everywhere, and later technological superiority), we could not unless it was corrigible and willing to let humans adjust it. There may be a period in the early stages of an AGI's development where it would be trying very hard to convince us that we should not shut it down and/or hiding itself and/or making copies of itself onto every server on earth.
Instrumental Convergence and the Stop Button Problem are the key reasons it would not be simple to shut down a non corrigible advanced system. If the AI wants to collect stamps, being turned off means it gets less stamps, so even without an explicit goal of not being turned off it has an instrumental reason to avoid being turned off (e.g. once it acquires a detailed world model and general intelligence, it is likely to realise that by playing nice and pretending to be aligned if you have the power to turn it off, establishing control over any system we put in place to shut it down, and eliminating us if it has the power to reliably do so and we would otherwise pose a threat).
If you like interactive FAQs, you've already found one! All joking aside, probably the best places to start as a newcomer are the The AI Revolution posts on WaitBuyWhy: The Road to Superintelligence and Our Immortality or Extinction for a fun accessible intro, or AGI safety from first principles for a more up to date option. If you prefer videos, Rob Miles's YouTube and MIRI's AI Alignment: Why It’s Hard, and Where to Start are great.
If you've up for a book-length introduction, there are several options.
The book which first made the case to the public is Nick Bostrom's Superintelligence. It gives an excellent overview of the state of the field in 2014 and makes a strong case for the subject being important as well as exploring many fascinating adjacent topics. However, it does not cover newer developments, such as mesa-optimizers or language models.
There's also Human Compatible by Stuart Russell, which gives a more up-to-date (2019) review of developments, with an emphasis on the approaches that the Center for Human Compatible AI are working on such as cooperative inverse reinforcement learning. There's a good review/summary on SlateStarCodex.
The Alignment Problem by Brian Christian is the most recent (2020) and has more of an emphasis on machine learning and current generation problems with AI than Superintelligence or Human Compatible.
Though not limited to AI Safety, Rationality: A-Z covers a lot of skills which are valuable to acquire for people trying to think about large and complex issues, with The Rationalist's Guide to the Galaxy available as a shorter and more AI focused accessible option.
These are the answers to questions reviewed as high quality.
First, even “narrow” AI systems, which approach or surpass human intelligence in a small set of capabilities (such as image or voice recognition) already raise important questions regarding their impact on society. Making autonomous vehicles safe, analyzing the strategic and ethical dimensions of autonomous weapons, and the effect of AI on the global employment and economic systems are three examples. Second, the longer-term implications of human or super-human artificial intelligence are dramatic, and there is no consensus on how quickly such capabilities will be developed. Many experts believe there is a chance it could happen rather soon, making it imperative to begin investigating long-term safety issues now, if only to get a better sense of how much early progress is actually possible.
One possible way to ensure the safety of a powerful AI system is to keep it contained in a software environment. There is nothing intrinsically wrong with this procedure - keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.
Humans sometimes put dangerous humans inside boxes to limit their ability to influence the external world. Sometimes, these humans escape their boxes. The security of a prison depends on certain assumptions, which can be violated. Yoshie Shiratori reportedly escaped prison by weakening the door-frame with miso soup and dislocating his shoulders.
Human written software has a high defect rate; we should expect a perfectly secure system to be difficult to create. If humans construct a software system they think is secure, it is possible that the security relies on a false assumption. A powerful AI system could potentially learn how its hardware works and manipulate bits to send radio signals. It could fake a malfunction and attempt social engineering when the engineers look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.
Experimentally, humans have convinced other humans to let them out of the box. Spooky.
Each major organization has a different approach. The research agendas are detailed and complex (see also AI Watch). Getting more brains working on any of them (and more money to fund them) may pay off in a big way, but it’s very hard to be confident which (if any) of them will actually work.
The following is a massive oversimplification, each organization actually pursues many different avenues of research, read the 2020 AI Alignment Literature Review and Charity Comparison for much more detail. That being said:
- The Machine Intelligence Research Institute focuses on foundational mathematical research to understand reliable reasoning, which they think is necessary to provide anything like an assurance that a seed AI built will do good things if activated.
- The Center for Human-Compatible AI focuses on Cooperative Inverse Reinforcement Learning and Assistance Games, a new paradigm for AI where they try to optimize for doing the kinds of things humans want rather than for a pre-specified utility function
- Paul Christano's Alignment Research Center focuses is on prosaic alignment, particularly on creating tools that empower humans to understand and guide systems much smarter than ourselves. His methodology is explained on his blog.
- The Future of Humanity Institute does work on crucial considerations and other x-risks, as well as AI safety research and outreach.
- Anthropic is a new organization exploring natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
- OpenAI is in a state of flux after major changes to their safety team.
- DeepMind’s safety team is working on various approaches designed to work with modern machine learning, and does some communication via the Alignment Newsletter.
- EleutherAI is a Machine Learning collective aiming to build large open source language models to allow more alignment research to take place.
- Ought is a research lab that develops mechanisms for delegating open-ended thinking to advanced machine learning systems.
The argument goes: yes, a superintelligent AI might be far smarter than Einstein, but it’s still just one program, sitting in a supercomputer somewhere. That could be bad if an enemy government controls it and asks its help inventing superweapons – but then the problem is the enemy government, not the AI per se. Is there any reason to be afraid of the AI itself? Suppose the AI did feel hostile – suppose it even wanted to take over the world? Why should we think it has any chance of doing so?
Compounded over enough time and space, intelligence is an awesome advantage. Intelligence is the only advantage we have over lions, who are otherwise much bigger and stronger and faster than we are. But we have total control over lions, keeping them in zoos to gawk at, hunting them for sport, and holding them on the brink of extinction. And this isn’t just the same kind of quantitative advantage tigers have over lions, where maybe they’re a little bigger and stronger but they’re at least on a level playing field and enough lions could probably overpower the tigers. Humans are playing a completely different game than the lions, one that no lion will ever be able to respond to or even comprehend. Short of human civilization collapsing or lions evolving human-level intelligence, our domination over them is about as complete as it is possible for domination to be.
Since superintelligences will be as far beyond Einstein as Einstein is beyond a village idiot, we might worry that they would have the same kind of qualitative advantage over us that we have over lions.
You might say that human civilization as a whole is dangerous to lions. But a single human placed amid a pack of lions with no raw materials for building technology is going to get ripped to shreds. So although thousands of superintelligences, given a long time and a lot of opportunity to build things, might be able to dominate humans – what harm could a single superintelligence do?
Superintelligence has an advantage that a human fighting a pack of lions doesn’t – the entire context of human civilization and technology, there for it to manipulate socially or technologically.
Yes. In 2014, Google bought artificial intelligence startup DeepMind for $400 million; DeepMind added the condition that Google promise to set up an AI Ethics Board. DeepMind cofounder Shane Legg has said in interviews that he believes superintelligent AI will be “something approaching absolute power” and “the number one risk for this century”.
Many other science and technology leaders agree. Late astrophysicist Stephen Hawking said that superintelligence “could spell the end of the human race.” Tech billionaire Bill Gates describes himself as “in the camp that is concerned about superintelligence…I don’t understand why some people are not concerned”. SpaceX/Tesla CEO Elon Musk calls superintelligence “our greatest existential threat” and donated $10 million from his personal fortune to study the danger. Stuart Russell, Professor of Computer Science at Berkeley and world-famous AI expert, warns of “species-ending problems” and wants his field to pivot to make superintelligence-related risks a central concern.
Professor Nick Bostrom is the director of Oxford’s Future of Humanity Institute, tasked with anticipating and preventing threats to human civilization. He has been studying the risks of artificial intelligence for twenty years. The explanations in the follow-up questions are loosely adapted from his 2014 book Superintelligence.
The basic concern as AI systems become increasingly powerful is that they won’t do what we want them to do – perhaps because they aren’t correctly designed, perhaps because they are deliberately subverted, or perhaps because they do what we tell them to do rather than what we really want them to do (like in the classic stories of genies and wishes.) Many AI systems are programmed to have goals and to attain them as effectively as possible – for example, a trading algorithm has the goal of maximizing profit. Unless carefully designed to act in ways consistent with human values, a highly sophisticated AI trading system might exploit means that even the most ruthless financier would disavow. These are systems that literally have a mind of their own, and maintaining alignment between human interests and their choices and actions will be crucial.
AI is already superhuman at some tasks, for example numerical computations, and will clearly surpass humans in others as time goes on. We don’t know when (or even if) machines will reach human-level ability in all cognitive tasks, but most of the AI researchers at FLI’s conference in Puerto Rico put the odds above 50% for this century, and many offered a significantly shorter timeline. Since the impact on humanity will be huge if it happens, it’s worthwhile to start research now on how to ensure that any impact is positive. Many researchers also believe that dealing with superintelligent AI will be qualitatively very different from more narrow AI systems, and will require very significant research effort to get right.
It likely will – however, intelligence is, by many definitions, the ability to figure out how to accomplish goals. Even in today’s advanced AI systems, the builders assign the goal but don’t tell the AI exactly how to accomplish it, nor necessarily predict in detail how it will be done; indeed those systems often solve problems in creative, unpredictable ways. Thus the thing that makes such systems intelligent is precisely what can make them difficult to predict and control. They may therefore attain the goal we set them via means inconsistent with our preferences.
Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more.1 However, human intelligence continues to dominate machine intelligence in generality.
A powerful chess computer is “narrow”: it can’t play other games. In contrast, humans have problem-solving abilities that allow us to adapt to new contexts and excel in many domains other than what the ancestral environment prepared us for.
In the absence of a formal definition of “intelligence” (and therefore of “artificial intelligence”), we can heuristically cite humans’ perceptual, inferential, and deliberative faculties (as opposed to, e.g., our physical strength or agility) and say that intelligence is “those kinds of things.” On this conception, intelligence is a bundle of distinct faculties — albeit a very important bundle that includes our capacity for science.
Our cognitive abilities stem from high-level patterns in our brains, and these patterns can be instantiated in silicon as well as carbon. This tells us that general AI is possible, though it doesn’t tell us how difficult it is. If intelligence is sufficiently difficult to understand, then we may arrive at machine intelligence by scanning and emulating human brains or by some trial-and-error process (like evolution), rather than by hand-coding a software agent.
If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond qualitative improvements in cognition, Nick Bostrom notes more straightforward advantages we could realize in digital minds, e.g.:
- editability — “It is easier to experiment with parameter variations in software than in neural wetware.”2
- speed — “The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.”
- serial depth — On short timescales, machines can carry out much longer sequential processes.
- storage capacity — Computers can plausibly have greater working and long-term memory.
- size — Computers can be much larger than a human brain.
- duplicability — Copying software onto new hardware can be much faster and higher-fidelity than biological reproduction.
Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.
The Control Problem is the problem of preventing artificial superintelligence (ASI) from having a negative impact on humanity. How do we keep a more intelligent being under control, or how do we align it with our values? If we succeed in solving this problem, intelligence vastly superior to ours can take the baton of human progress and carry it to unfathomable heights. Solving our most complex problems could be simple to a sufficiently intelligent machine. If we fail in solving the Control Problem and create a powerful ASI not aligned with our values, it could spell the end of the human race. For these reasons, The Control Problem may be the most important challenge that humanity has ever faced, and may be our last.