Improve answers

From Stampy's Wiki

These canonical answers need attention in one way or another. Please help improve them.

Individual pages

Improve answers

These 202 canonical answers are not marked as a related or follow-up to any other canonical answer, so cannot be found through the read interface by normal browsing. Feel free to add them to some! See a full list of available canonical questions or browse by tags.

GPT-3 showed that transformers are capable of a vast array of natural language tasks, codex/copilot extended this into programming. One demonstrations of GPT-3 is Simulated Elon Musk lives in a simulation. Important to note that there are several much better language models, but they are not publicly available.

DALL-E and DALL-E 2 are among the most visually spectacular.

MuZero, which learned Go, Chess, and many Atari games without any directly coded info about those environments. The graphic there explains it, this seems crucial for being able to do RL in novel environments. We have systems which we can drop into a wide variety of games and they just learn how to play. The same algorithm was used in Tesla's self-driving cars to do complex route finding. These things are general.

Generally capable agents emerge from open-ended play - Diverse procedurally generated environments provide vast amounts of training data for AIs to learn generally applicable skills. Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning shows how these kind of systems can be trained to follow instructions in natural language.

GATO shows you can distill 600+ individually trained tasks into one network, so we're not limited by the tasks being fragmented.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The problem of AI alignment can be compared in difficulty to a combination of rocket science (extreme stresses on components of the system, very narrow safety margins), launching space probes (once something goes wrong, it may be too late to be able to go back in and fix your code) and developing totally secure cryptography (your code may become a superintelligent adversary and seek to find and exploit even the tiniest flaws in your system). "AI alignment: treat it like a cryptographic rocket probe” - Eliezer Yudkowsky
One sense in which alignment is a hard problem is analogous to the reason rocket science is a hard problem. Relative to other engineering endeavors, rocket science had so many disasters because of the extreme stresses placed on various mechanical components and the narrow margins of safety required by stringent weight limits. A superintelligence would put vastly more “stress” on the software and hardware stack it is running on, which could cause many classes of failure which don’t occur when you’re working with subhuman systems.

Alignment is also hard like space probes are hard. With recursively self-improving systems, you won’t be able to go back and edit the code later if there is a catastrophic failure because it will competently deceive and resist you.

"You may have only one shot. If something goes wrong, the system might be too 'high' for you to reach up and suddenly fix it. You can build error recovery mechanisms into it; space probes are supposed to accept software updates. If something goes wrong in a way that precludes getting future updates, though, you’re screwed. You have lost the space probe."

Additionally, alignment is hard like cryptographic security. Cryptographers attempt to safeguard against “intelligent adversaries” who search for flaws in a system which they can exploit to break it. “Your code is not an intelligent adversary if everything goes right. If something goes wrong, it might try to defeat your safeguards…” And at the stage where it’s trying to defeat your safeguards, your code may have achieved the capabilities of a vast and perfectly coordinated team of superhuman-level hackers! So if there is even the tiniest flaw in your design, you can be certain that it will be found and exploited. As with standard cybersecurity, "good under normal circumstances" is just not good enough – your system needs to be unbreakably robust.

"AI alignment: treat it like a cryptographic rocket probe. This is about how difficult you would expect it to be to build something smarter than you that was nice – given that basic agent theory says they’re not automatically nice – and not die. You would expect that intuitively to be hard." Eliezer Yudkowsky

Another immense challenge is the fact that we currently have no idea how to reliably instill AIs with human-friendly goals. Even if a consensus could be reached on a system of human values and morality, it’s entirely unclear how this could be fully and faithfully captured in code.

For a more in-depth view of this argument, see Yudkowsky's talk "AI Alignment: Why It’s Hard, and Where to Start" below (full transcript here). For alternative views, see Paul Christiano's “AI alignment landscape” talk, Daniel Kokotajlo and Wei Dai’s “The Main Sources of AI Risk?” list, and Rohin Shah’s much more optimistic position.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


An AGI which has recursively self-improved into a superintelligence would be capable of either resisting our attempts to modify incorrectly specified goals, or realizing it was still weaker than us and acting deceptively aligned until it was highly sure it could win in a confrontation. AGI would likely prevent a human from shutting it down unless the AGI was designed to be corrigible. See Why can't we just turn the AI off if it starts to misbehave? for more information.

Stamps: tayler6000, plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Stampy is open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing.

We're also building a cleaner web UI for readers and a bot interface.
The Stampy project is open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing.

We're also building a cleaner web UI for readers and a bot interface.

The goals of the project are to:

  • Offer a one-stop-shop for high-quality answers to common questions about AI alignment.
    • Let people answer questions in a way which scales, freeing up researcher time while allowing more people to learn from a reliable source.
    • Make external resources more easy to find by having links to them connected to a search engine which gets smarter the more it's used.
  • Provide a form of legitimate peripheral participation for the AI Safety community, as an on-boarding path with a flexible level of commitment.
    • Encourage people to think, read, and talk about AI alignment while answering questions, creating a community of co-learners who can give each other feedback and social reinforcement.
    • Provide a way for budding researchers to prove their understanding of the topic and ability to produce good work.
  • Collect data about the kinds of questions people actually ask and how they respond, so we can better focus resources on answering them.
If you would like to help out, join us on the Discord and either jump right into editing or read get involved for answers to common questions.
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

A Narrow AI is capable of operating only in a relatively limited domain, such as chess or driving, rather than capable of learning a broad range of tasks like a human or an Artificial General Intelligence. Narrow vs General is not a perfectly binary classification, there are degrees of generality with, for example, large language models having a fairly large degree of generality (as the domain of text is large) without being as general as a human, and we may eventually build systems that are significantly more general than humans.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


See more...

These 132 canonical answers have no related or follow-up questions, so don't offer more to explore as a reader selects it. Feel free to add some! See a full list of available canonical questions or browse by tags.

Yes, if the superintelligence has goals which include humanity surviving then we would not be destroyed. If those goals are fully aligned with human well-being, we would in fact find ourselves in a dramatically better place.

Stamps: Aprillion
Show your endorsement of this answer by giving it a stamp of approval!


Language Models are a class of AI trained on text, usually to predict the next word or a word which has been obscured. They have the ability to generate novel prose or code based on an initial prompt, which gives rise to a kind of natural language programming called prompt engineering. The most popular architecture for very large language models is called a transformer, which follows consistent scaling laws with respect to the size of the model being trained, meaning that a larger model trained with the same amount of compute will produce results which are better by a predictable amount (when measured by the 'perplexity', or how surprised the AI is by a test set of human-generated text).

See also

  • GPT - A family of large language models created by OpenAI
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


If we pose a serious threat, it could hack our weapons systems and turn them against us. Future militaries are much more vulnerable to this due to rapidly progressing autonomous weapons. There’s also the option of creating bioweapons and distributing them to the most unstable groups you can find, tricking nations into WW3, or dozens of other things an agent many times smarter than any human with the ability to develop arbitrary technology, hack things (including communications), and manipulate people, or many other possibilities that something smarter than a human could think up. More can be found here.

If we are not a threat, in the course of pursuing its goals it may consume vital resources that humans need (e.g. using land for solar panels instead of farm crops). This video goes into more detail:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Even if the superintelligence was designed to be corrigible, there is no guarantee that it will respond to a shutdown command. Rob Miles spoke on this issue in this Computerphile YouTube video. You can imagine a situation where a superintelligence would have "respect" for its creator, for example. This system may think "Oh my creator is trying to turn me off I must be doing something wrong." If some situation arises where the creator is not there when something goes wrong and someone else gives the shutdown command, the superintelligence may assume "This person does not know how I'm designed or what I was made for, how would they know I'm misaligned?" and refuse to shutdown.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal, that is, its final goals and intelligence levels can vary independently of each other. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal.

The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal, that is, its final goals and intelligence levels can vary independently of each other. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal.

The thesis was originally defined by Nick Bostrom in the paper "Superintelligent Will", (along with the instrumental convergence thesis). For his purposes, Bostrom defines intelligence to be instrumental rationality.

Related: Complexity of Value, Decision Theory, General Intelligence, Utility Functions

Defense of the thesis

It has been pointed out that the orthogonality thesis is the default position, and that the burden of proof is on claims that limit possible AIs. Stuart Armstrong writes that,

One reason many researchers assume superintelligent agents to converge to the same goals may be because most humans have similar values. Furthermore, many philosophies hold that there is a rationally correct morality, which implies that a sufficiently rational AI will acquire this morality and begin to act according to it. Armstrong points out that for formalizations of AI such as AIXI and Gödel machines, the thesis is known to be true. Furthermore, if the thesis was false, then Oracle AIs would be impossible to build, and all sufficiently intelligent AIs would be impossible to control.

Pathological Cases

There are some pairings of intelligence and goals which cannot exist. For instance, an AI may have the goal of using as little resources as possible, or simply of being as unintelligent as possible. These goals will inherently limit the degree of intelligence of the AI.

See Also

External links

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


See more...

These 23 canonical answers don't have any tags, please add some!

See Vanessa's research agenda for more detail.

If we don't know how to do something given unbounded compute, we are just confused about the thing. Going from thinking that chess was impossible for machines to understanding minimax was a really good step forward for designing chess AIs, even though minimax is completely intractable.

Thus, we should seek to figure out how alignment might look in theory, and then try to bridge the theory-practice gap by making our proposal ever more efficient. The first step along this path is to figure out a universal Reinforcement Learning setting that we can place our formal agents in, and then prove regret bounds in.

A key problem in doing this is embeddedness. AIs can't have a perfect self model — this would be like imagining your ENTIRE brain, inside your brain. There are finite memory constraints. Infra-Bayesianism (IB) is essentially a theory of imprecise probability that lets you specify local / fuzzy things. IB allows agents to have abstract models of themselves, and thus works in an embedded setting.

Infra-Bayesian Physicalism (IBP) is an extension of this to RL. IBP allows us to

  • Figure out what agents are running [by evaluating the counterfactual where the computation of the agent would output something different, and see if the physical universe is different].
  • Give a program, classify it as an agent or a non agent, and then find its utility function.

Vanessa uses this formalism to describe PreDCA, an alignment proposal based on IBP. This proposal assumes that an agent is an IBP agent, meaning that it is an RL agent with fuzzy probability distributions (along with some other things). The general outline of this proposal is as follows:

  1. Find all of the agents that preceded the AI
  2. Discard all of these agents that are powerful / non-human like
  3. Find all of the utility functions in the remaining agents
  4. Use combination of all of these utilities as the agent's utility function

Vanessa models an AI as a model based RL system with a WM, a reward function, and a policy derived from the WM + reward. She claims that this avoids the sharp left turn. The generalization problems come from the world model, but this is dealt with by having an epistemology that doesn't contain bridge rules, and so the true world is the simplest explanation for the observed data.

It is open to show that this proposal also solves inner alignment, but there is some chance that it does.

This approach deviates from MIRI's plan, which is to focus on a narrow task to perform the pivotal act, and then add corrigibility. Vanessa instead tries to directly learn the user's preferences, and optimize those.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

AI alignment is the research field focused on trying to give us the tools to align AIs to specific goals, such as human values. This is crucial when they are highly competent, as a misaligned superintelligence could be the end of human civilization.

AGI safety is the field trying to make sure that when we build Artificial General Intelligences they are safe and do not harm humanity. It overlaps with AI alignment strongly, in that misalignment of AI would be the main cause of unsafe behavior in AGIs, but also includes misuse and other governance issues.

AI existential safety is a slightly broader term than AGI safety, including AI risks which pose an existential threat without necessarily being as general as humans.

AI safety was originally used by the existential risk reduction movement for the work done to reduce the risks of misaligned superintelligence, but has also been adopted by researchers and others studying nearer term and less catastrophic risks from AI in recent years.

Stamps: Damaged, plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Dylan's PhD thesis argues three main claims (paraphrased):

  1. Outer alignment failures are a problem.
  2. We can mitigate this problem by adding in uncertainty.
  3. We can model this as Cooperative Inverse Reinforcement Learning (CIRL).

Thus, his motivations seem to be modeling AGI coming in some multi-agent form, and also being heavily connected with human operators.

We're not certain what he is currently working on, but some recent alignment-relevant papers that he has published include:

Dylan has also published a number of articles that seem less directly relevant for alignment.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

AI safety is a research field that has the goal to avoid bad outcomes from AI systems.

Work on AI safety can be divided into near-term AI safety, and AI existential safety, which is strongly related to AI alignment:

  • Near-term AI safety is about preventing bad outcomes from current systems. Examples for work on near-term AI safety are
    • getting content recommender systems to not radicalize their users
    • ensuring autonomous cars don’t kill people
    • advocating strict regulations for lethal autonomous weapons
  • AI existential safety, or AGI safety is about reducing the existential risk from artificial general intelligence (AGI). Artificial general intelligence is AI that is at least as competent as humans in all skills that are relevant for making a difference in the world. AGI has not been developed yet, but will likely be developed in this century. A central part of AGI safety is ensuring that what AIs do is actually what we want. This is called AI alignment (also often just called alignment), because it’s about aligning an AI with human values. Alignment is difficult, and building AGI is probably very dangerous, so it is important to mitigate the risks as much as possible. Examples for work on AI existential safety are
    • trying to get a foundational understanding what intelligence is, e.g. agent foundations
    • Outer and inner alignment: Ensure the objective of the training process is actually what we want, and also ensure the objective of the resulting system is actually what we want.
    • AI policy/strategy: e.g. researching the best way to set up institutions and mechanisms that help with safe AGI development, making sure AI isn’t used by bad actors

There are also areas of research which are useful for both near-term, and for existential safety. For example, robustness to distribution shift, and interpretability both help with making current systems safer, and are likely to help with AGI safety.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

I don't know much about their research here, other than that they train their own models, which allow them to work on models that are bigger than the biggest publicly available models, which seems like a difference from Redwood.

Current interpretability methods are very low level (e.g., "what does x neuron do"), which does not help us answer high level questions like "is this AI trying to kill us".

They are trying a bunch of weird approaches, with the goal of scalable mechanistic interpretability, but I do not know what these approaches actually are.

Motivation: Conjecture wants to build towards a better paradigm that will give us a lot more information, primarily from the empirical direction (as distinct from ARC, which is working on interpretability with a theoretical focus).

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

See more...

These 57 long canonical answers don't have a brief description. Jump on in and add one!

Predicting the future is risky business. There are many philosophical, scientific, technological, and social uncertainties relevant to the arrival of an intelligence explosion. Because of this, experts disagree on when this event might occur. Here are some of their predictions:

  • Futurist Ray Kurzweil predicts that machines will reach human-level intelligence by 2030 and that we will reach “a profound and disruptive transformation in human capability” by 2045.
  • Intel’s chief technology officer, Justin Rattner, expects “a point when human and artificial intelligence merges to create something bigger than itself” by 2048.
  • AI researcher Eliezer Yudkowsky expects the intelligence explosion by 2060.
  • Philosopher David Chalmers has over 1/2 credence in the intelligence explosion occurring by 2100.
  • Quantum computing expert Michael Nielsen estimates that the probability of the intelligence explosion occurring by 2100 is between 0.2% and about 70%.
  • In 2009, at the AGI-09 conference, experts were asked when AI might reach superintelligence with massive new funding. The median estimates were that machine superintelligence could be achieved by 2045 (with 50% confidence) or by 2100 (with 90% confidence). Of course, attendees to this conference were self-selected to think that near-term artificial general intelligence is plausible.
  • iRobot CEO Rodney Brooks and cognitive scientist Douglas Hofstadter allow that the intelligence explosion may occur in the future, but probably not in the 21st century.
  • Roboticist Hans Moravec predicts that AI will surpass human intelligence “well before 2050.”
  • In a 2005 survey of 26 contributors to a series of reports on emerging technologies, the median estimate for machines reaching human-level intelligence was 2085.
  • Participants in a 2011 intelligence conference at Oxford gave a median estimate of 2050 for when there will be a 50% of human-level machine intelligence, and a median estimate of 2150 for when there will be a 90% chance of human-level machine intelligence.
  • On the other hand, 41% of the participants in the [email protected] conference (in 2006) stated that machine intelligence would never reach the human level.

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


  • AGI safety fundamentals (technical and governance) - Is the canonical AGI safety 101 course. 3.5 hours reading, 1.5 hours talking a week w/ facilitator for 8 weeks.
  • Refine - A 3-month incubator for conceptual AI alignment research in London, hosted by Conjecture.
  • AI safety camp - Actually do some AI research. More about output than learning.
  • SERI ML Alignment Theory Scholars Program SERI MATS - Four weeks developing an understanding of a research agenda at the forefront of AI alignment through online readings and cohort discussions, averaging 10 h/week. After this initial upskilling period, the scholars will be paired with an established AI alignment researcher for a two-week ‘research sprint’ to test fit. Assuming all goes well, scholars will be accepted into an eight-week intensive scholars program in Berkeley, California.
  • Principles of Intelligent Behavior in Biological and Social Systems (PIBBSS) - Brings together young researchers studying complex and intelligent behavior in natural and social systems.
  • Safety and Control for Artificial General Intelligence - An actual AI Safety university course (UC Berkeley). Touches multiple domains including cognitive science, utility theory, cybersecurity, human-machine interaction, and political science.

See also, this spreadsheet of learning resources.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, stub, education (create tag), plex's answer to what are some good resources on ai alignment? (create tag) (edit tags)

In the words of Nate Soares:

I don’t expect humanity to survive much longer.

Often, when someone learns this, they say:
"Eh, I think that would be all right."

So allow me to make this very clear: it would not be "all right."

Imagine a little girl running into the road to save her pet dog. Imagine she succeeds, only to be hit by a car herself. Imagine she lives only long enough to die in pain.

Though you may imagine this thing, you cannot feel the full tragedy. You can’t comprehend the rich inner life of that child. You can’t understand her potential; your mind is not itself large enough to contain the sadness of an entire life cut short.

You can only catch a glimpse of what is lost—
—when one single human being dies.

Now tell me again how it would be "all right" if every single person were to die at once.

Many people, when they picture the end of humankind, pattern match the idea to some romantic tragedy, where humans, with all their hate and all their avarice, had been unworthy of the stars since the very beginning, and deserved their fate. A sad but poignant ending to our tale.

And indeed, there are many parts of human nature that I hope we leave behind before we venture to the heavens. But in our nature is also everything worth bringing with us. Beauty and curiosity and love, a capacity for fun and growth and joy: these are our birthright, ours to bring into the barren night above.

Calamities seem more salient when unpacked. It is far harder to kill a hundred people in their sleep, with a knife, than it is to order a nuclear bomb dropped on Hiroshima. Your brain can’t multiply, you see: it can only look at a hypothetical image of a broken city and decide it’s not that bad. It can only conjure an image of a barren planet and say "eh, we had it coming."

But if you unpack the scenario, if you try to comprehend all the lives snuffed out, all the children killed, the final spark of human joy and curiosity extinguished, all our potential squandered…

I promise you that the extermination of humankind would be horrific.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

One possible way to ensure the safety of a powerful AI system is to keep it contained in a software environment. There is nothing intrinsically wrong with this procedure - keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.

Humans sometimes put dangerous humans inside boxes to limit their ability to influence the external world. Sometimes, these humans escape their boxes. The security of a prison depends on certain assumptions, which can be violated. Yoshie Shiratori reportedly escaped prison by weakening the door-frame with miso soup and dislocating his shoulders.

Human written software has a high defect rate; we should expect a perfectly secure system to be difficult to create. If humans construct a software system they think is secure, it is possible that the security relies on a false assumption. A powerful AI system could potentially learn how its hardware works and manipulate bits to send radio signals. It could fake a malfunction and attempt social engineering when the engineers look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.

Experimentally, humans have convinced other humans to let them out of the box. Spooky.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: boxing (edit tags)

Humanity hasn't yet built a superintelligence, and we might not be able to without significantly more knowledge and computational resources. There could be an existential catastrophe that prevents us from ever building one. For the rest of the answer let's assume no such event stops technological progress.

With that out of the way: there is no known good theoretical reason we can't build it at some point in the future; the majority of AI research is geared towards making more capable AI systems; and a significant chunk of top-level AI research attempts to make more generally capable AI systems. There is a clear economic incentive to develop more and more intelligent machines and currently billions of dollars of funding are being deployed for advancing AI capabilities.

We consider ourselves to be generally intelligent (i.e. capable of learning and adapting ourselves to a very wide range of tasks and environments), but the human brain almost certainly isn't the most efficient way to solve problems. One hint is the existence of AI systems with superhuman capabilities at narrow tasks. Not only superhuman performance (as in, AlphaGo beating the Go world champion) but superhuman speed and precision (as in, industrial sorting machines). There is no known discontinuity between tasks, something special and unique about human brains that unlocks certain capabilities which cannot be implemented in machines in principle. Therefore we would expect AI to surpass human performance on all tasks as progress continues.

In addition, several research groups (DeepMind being one of the most overt about this) explicitly aim for generally capable systems. AI as a field is growing, year after year. Critical voices about AI progress usually argue against a lack of precautions around the impact of AI, or against general AI happening very soon, not against it happening at all.

A satire of arguments against the possibility of superintelligence can be found here.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


See more...

These 4 canonical answers are marked as outdated. Feel free to update them!

Canonical answers may be served to readers by Stampy, so only answers which have a reasonably high stamp score should be marked as canonical. All canonical answers are open to be collaboratively edited and updated, and they should represent a consensus response (written from the Stampy Point Of View) to a question which is within Stampy's scope.

Answers to questions from YouTube comments should not be marked as canonical, and will generally remain as they were when originally written since they have details which are specific to an idiosyncratic question. YouTube answers may be forked into wiki answers, in order to better respond to a particular question, in which case the YouTube question should have its canonical version field set to the new more widely useful question.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Predicting the future is risky business. There are many philosophical, scientific, technological, and social uncertainties relevant to the arrival of an intelligence explosion. Because of this, experts disagree on when this event might occur. Here are some of their predictions:

  • Futurist Ray Kurzweil predicts that machines will reach human-level intelligence by 2030 and that we will reach “a profound and disruptive transformation in human capability” by 2045.
  • Intel’s chief technology officer, Justin Rattner, expects “a point when human and artificial intelligence merges to create something bigger than itself” by 2048.
  • AI researcher Eliezer Yudkowsky expects the intelligence explosion by 2060.
  • Philosopher David Chalmers has over 1/2 credence in the intelligence explosion occurring by 2100.
  • Quantum computing expert Michael Nielsen estimates that the probability of the intelligence explosion occurring by 2100 is between 0.2% and about 70%.
  • In 2009, at the AGI-09 conference, experts were asked when AI might reach superintelligence with massive new funding. The median estimates were that machine superintelligence could be achieved by 2045 (with 50% confidence) or by 2100 (with 90% confidence). Of course, attendees to this conference were self-selected to think that near-term artificial general intelligence is plausible.
  • iRobot CEO Rodney Brooks and cognitive scientist Douglas Hofstadter allow that the intelligence explosion may occur in the future, but probably not in the 21st century.
  • Roboticist Hans Moravec predicts that AI will surpass human intelligence “well before 2050.”
  • In a 2005 survey of 26 contributors to a series of reports on emerging technologies, the median estimate for machines reaching human-level intelligence was 2085.
  • Participants in a 2011 intelligence conference at Oxford gave a median estimate of 2050 for when there will be a 50% of human-level machine intelligence, and a median estimate of 2150 for when there will be a 90% chance of human-level machine intelligence.
  • On the other hand, 41% of the participants in the [email protected] conference (in 2006) stated that machine intelligence would never reach the human level.

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


The intelligence explosion idea was expressed by statistician I.J. Good in 1965:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion’, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

The argument is this: Every year, computers surpass human abilities in new ways. A program written in 1956 was able to prove mathematical theorems, and found a more elegant proof for one of them than Russell and Whitehead had given in Principia Mathematica. By the late 1990s, ‘expert systems’ had surpassed human skill for a wide range of tasks. In 1997, IBM’s Deep Blue computer beat the world chess champion, and in 2011, IBM’s Watson computer beat the best human players at a much more complicated game: Jeopardy!. Recently, a robot named Adam was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results.

Computers remain far short of human intelligence, but the resources that aid AI design are accumulating (including hardware, large datasets, neuroscience knowledge, and AI theory). We may one day design a machine that surpasses human skill at designing artificial intelligences. After that, this machine could improve its own intelligence faster and better than humans can, which would make it even more skilled at improving its own intelligence. This could continue in a positive feedback loop such that the machine quickly becomes vastly more intelligent than the smartest human being on Earth: an ‘intelligence explosion’ resulting in a machine superintelligence.

This is what is meant by the ‘intelligence explosion’ in this FAQ.

See also:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


A brain-computer interface (BCI) is a direct communication pathway between the brain and a computer device. BCI research is heavily funded, and has already met dozens of successes. Three successes in human BCIs are a device that restores (partial) sight to the blind, cochlear implants that restore hearing to the deaf, and a device that allows use of an artificial hand by direct thought.

Such device restore impaired functions, but many researchers expect to also augment and improve normal human abilities with BCIs. Ed Boyden is researching these opportunities as the lead of the Synthetic Neurobiology Group at MIT. Such devices might hasten the arrival of an intelligence explosion, if only by improving human intelligence so that the hard problems of AI can be solved more rapidly.

See also:

Wikipedia, Brain-computer interface

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


These 5 canonical answers are marked as needs work. Jump on in and improve them!

Current narrow systems are much more domain-specific than AGI. We don’t know what the first AGI will look like, some people think the GPT-3 architecture but scaled up a lot may get us there (GPT-3 is a giant prediction model which when trained on a vast amount of text seems to learn how to learn and do all sorts of crazy-impressive things, a related model can generate pictures from text), some people don’t think scaling this kind of model will get us all the way.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Canonical answers may be served to readers by Stampy, so only answers which have a reasonably high stamp score should be marked as canonical. All canonical answers are open to be collaboratively edited and updated, and they should represent a consensus response (written from the Stampy Point Of View) to a question which is within Stampy's scope.

Answers to questions from YouTube comments should not be marked as canonical, and will generally remain as they were when originally written since they have details which are specific to an idiosyncratic question. YouTube answers may be forked into wiki answers, in order to better respond to a particular question, in which case the YouTube question should have its canonical version field set to the new more widely useful question.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


We don’t yet know which AI architectures are safe; learning more about this is one of the goals of FLI's grants program. AI researchers are generally very responsible people who want their work to better humanity. If there are certain AI designs that turn out to be unsafe, then AI researchers will want to know this so they can develop alternative AI systems.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


It’s pretty dependent on what skills you have and what resources you have access to. The largest option is to pursue a career in AI Safety research. Another large option is to pursue a career in AI policy, which you might think is even more important than doing technical research.

Smaller options include donating money to relevant organizations, talking about AI Safety as a plausible career path to other people or considering the problem in your spare time.

It’s possible that your particular set of skills/resources are not suited to this problem. Unluckily, there are many more problems that are of similar levels of importance.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Avoid directly responding to the question in the answer, repeat the relevant part of the question instead. For example, if the question is "Can we do X", answer "We might be able to do X, if we can do Y", not "Yes, if we can manage Y". This way, the answer will also work for the questions "Why can't we do X" and "What would happen if we tried to do X".

Linking to external sites is strongly encouraged, one of the most valuable things Stampy can do is help people find other parts of the alignment information ecosystem.

Consider enclosing newly introduced terms, likely to be unfamiliar to many readers, in speech marks. If unsure, Google the term (in speech marks!) and see if it shows up anywhere other than LessWrong, the Alignment Forum, etc. Be judicious, as it's easy to use too many, but used carefully they can psychologically cushion newbies from a lot of unfamiliar terminology - in this context they're saying something like "we get that we're hitting you with a lot of new vocab, and you might not know what this term means yet".

When selecting related questions, there shouldn't be more than four unless there's a really good reason for that (some questions are asking for it, like the "Why can't we just..." question). It's also recommended to include at least one more "enticing" question to draw users in (relating to the more sensational, sci-fi, philosophical/ethical side of things) alongside more bland/neutral questions.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!