Recent answers

From Stampy's Wiki

Back to Review answers.

Intelligence is powerful. One might say that “Intelligence is no match for a gun, or for someone with lots of money,” but both guns and money were produced by intelligence. If not for our intelligence, humans would still be foraging the savannah for food.

Intelligence is what caused humans to dominate the planet in the blink of an eye (on evolutionary timescales). Intelligence is what allows us to eradicate diseases, and what gives us the potential to eradicate ourselves with nuclear war. Intelligence gives us superior strategic skills, superior social skills, superior economic productivity, and the power of invention.

A machine with superintelligence would be able to hack into vulnerable networks via the internet, commandeer those resources for additional computing power, take over mobile machines connected to networks connected to the internet, use them to build additional machines, perform scientific experiments to understand the world better than humans can, invent quantum computing and nanotechnology, manipulate the social world better than we can, and do whatever it can to give itself more power to achieve its goals — all at a speed much faster than humans can respond to.

See also


Dreyfus and Penrose have argued that human cognitive abilities can’t be emulated by a computational machine. Searle and Block argue that certain kinds of machines cannot have a mind (consciousness, intentionality, etc.). But these objections need not concern those who predict an intelligence explosion.

We can reply to Dreyfus and Penrose by noting that an intelligence explosion does not require an AI to be a classical computational system. And we can reply to Searle and Block by noting that an intelligence explosion does not depend on machines having consciousness or other properties of ‘mind’, only that it be able to solve problems better than humans can in a wide variety of unpredictable environments. As Edsger Dijkstra once said, the question of whether a machine can ‘really’ think is “no more interesting than the question of whether a submarine can swim.”

Others who are pessimistic about an intelligence explosion occurring within the next few centuries don’t have a specific objection but instead think there are hidden obstacles that will reveal themselves and slow or halt progress toward machine superintelligence.

Finally, a global catastrophe like nuclear war or a large asteroid impact could so damage human civilization that the intelligence explosion never occurs. Or, a stable and global totalitarianism could prevent the technological development required for an intelligence explosion to occur.


Predicting the future is risky business. There are many philosophical, scientific, technological, and social uncertainties relevant to the arrival of an intelligence explosion. Because of this, experts disagree on when this event might occur. Here are some of their predictions:

  • Futurist Ray Kurzweil predicts that machines will reach human-level intelligence by 2030 and that we will reach “a profound and disruptive transformation in human capability” by 2045.
  • Intel’s chief technology officer, Justin Rattner, expects “a point when human and artificial intelligence merges to create something bigger than itself” by 2048.
  • AI researcher Eliezer Yudkowsky expects the intelligence explosion by 2060.
  • Philosopher David Chalmers has over 1/2 credence in the intelligence explosion occurring by 2100.
  • Quantum computing expert Michael Nielsen estimates that the probability of the intelligence explosion occurring by 2100 is between 0.2% and about 70%.
  • In 2009, at the AGI-09 conference, experts were asked when AI might reach superintelligence with massive new funding. The median estimates were that machine superintelligence could be achieved by 2045 (with 50% confidence) or by 2100 (with 90% confidence). Of course, attendees to this conference were self-selected to think that near-term artificial general intelligence is plausible.
  • iRobot CEO Rodney Brooks and cognitive scientist Douglas Hofstadter allow that the intelligence explosion may occur in the future, but probably not in the 21st century.
  • Roboticist Hans Moravec predicts that AI will surpass human intelligence “well before 2050.”
  • In a 2005 survey of 26 contributors to a series of reports on emerging technologies, the median estimate for machines reaching human-level intelligence was 2085.
  • Participants in a 2011 intelligence conference at Oxford gave a median estimate of 2050 for when there will be a 50% of human-level machine intelligence, and a median estimate of 2150 for when there will be a 90% chance of human-level machine intelligence.
  • On the other hand, 41% of the participants in the [email protected] conference (in 2006) stated that machine intelligence would never reach the human level.

See also:

Nick Bostrom defined ‘superintelligence’ as:

"an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills."

This definition includes vague terms like ‘much’ and ‘practically’, but it will serve as a working definition for superintelligence in this FAQ An intelligence explosion would lead to machine superintelligence, and some believe that an intelligence explosion is the most likely path to superintelligence.

See also:

Bostrom, Long Before Superintelligence?
Legg, Machine Super Intelligence


There are many paths to artificial general intelligence (AGI). One path is to imitate the human brain by using neural nets or evolutionary algorithms to build dozens of separate components which can then be pieced together (Neural Networks and Natural Intelligence., A ‘neural-gas’ network learns topologies., pp.159-174). Another path is to start with a formal model of perfect general intelligence and try to approximate that(pp. 199-223, pp. 227-287). A third path is to focus on developing a ‘seed AI’ that can recursively self-improve, such that it can learn to be intelligent on its own without needing to first achieve human-level general intelligence (link). Eurisko is a self-improving AI in a limited domain, but is not able to achieve human-level general intelligence.

See also:


Tags: agi, recursive self-improvement, seed ai (create tag), neuromorphic ai (create tag) (edit tags)

A brain-computer interface (BCI) is a direct communication pathway between the brain and a computer device. BCI research is heavily funded, and has already met dozens of successes. Three successes in human BCIs are a device that restores (partial) sight to the blind, cochlear implants that restore hearing to the deaf, and a device that allows use of an artificial hand by direct thought.

Such device restore impaired functions, but many researchers expect to also augment and improve normal human abilities with BCIs. Ed Boyden is researching these opportunities as the lead of the Synthetic Neurobiology Group at MIT. Such devices might hasten the arrival of an intelligence explosion, if only by improving human intelligence so that the hard problems of AI can be solved more rapidly.

See also:

Wikipedia, Brain-computer interface


Tags: outdated, definitions, brain-computer interfaces (create tag) (edit tags)

There may be genes or molecules that can be modified to improve general intelligence. Researchers have already done this in mice: they over-expressed the NR2B gene, which improved those mice’s memory beyond that of any other mice of any mouse species. Biological cognitive enhancement in humans may cause an intelligence explosion to occur more quickly than it otherwise would.

See also:


Tags: definitions, cognitive enhancement (create tag) (edit tags)

Answer to What is whole brain emulation?
Orphan answerThis answer is not attached to any question.

Whole Brain Emulation (WBE) or ‘mind uploading’ is a computer emulation of all the cells and connections in a human brain. So even if the underlying principles of general intelligence prove difficult to discover, we might still emulate an entire human brain and make it run at a million times its normal speed (computer circuits communicate much faster than neurons do). Such a WBE could do more thinking in one second than a normal human can in 31 years. So this would not lead immediately to smarter-than-human intelligence, but it would lead to faster-than-human intelligence. A WBE could be backed up (leading to a kind of immortality), and it could be copied so that hundreds or millions of WBEs could work on separate problems in parallel. If WBEs are created, they may therefore be able to solve scientific problems far more rapidly than ordinary humans, accelerating further technological progress.

See also:

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. But one thing that makes humans special is their general intelligence. Humans can intelligently adapt to radically new problems in the urban jungle or outer space for which evolution could not have prepared them. Humans can solve problems for which their brain hardware and software was never trained. Humans can even examine the processes that produce their own intelligence (cognitive neuroscience), and design new kinds of intelligence never seen before (artificial intelligence).

To possess greater-than-human intelligence, a machine must be able to achieve goals more effectively than humans can, in a wider range of environments than humans can. This kind of intelligence involves the capacity not just to do science and play chess, but also to manipulate the social environment.

Computer scientist Marcus Hutter has described a formal model called AIXI that he says possesses the greatest general intelligence possible. But to implement it would require more computing power than all the matter in the universe can provide. Several projects try to approximate AIXI while still being computable, for example MC-AIXI.

Still, there remains much work to be done before greater-than-human intelligence can be achieved in machines. Greater-than-human intelligence need not be achieved by directly programming a machine to be intelligent. It could also be achieved by whole brain emulation, by biological cognitive enhancement, or by brain-computer interfaces (see below).

See also:

ZT1ST's question on Instrumental Convergence

@6:03; So I guess we won't want to teach the AI to know how to pull a Thanatos Gambit?

If the system is a highly capable AGI, I wouldn't expect our teaching to be that relevant in this case. If the most effective stamp maximising plan involves the AGI dying, it's reasonable to expect the AGI to choose that option, whether we explicitly taught it the concept of a Thanatos Gambit or not. It could figure that out for itself.

Tags: None (add tags)

Consider Codex or GPT-3.

Making a narrow AI is costly and time-consuming, and it's resources you're not spending elsewhere. By making a more general intelligence, you can have more leverage and reuse what you've made. There is another incentive in the sense that making an AI narrow means training it on specific dataset, and build in a lot of behaviour. Codex at the moment is mostly trained on Python, but a natural development would be to want it to be able to code in any language.

Of course, there are some conditions for that to apply. It would need to be fairly easy to scale up in terms of structure, for one. Which according to how throwing more computational power leads to better results with GPT does seem to be the case.
It also assumes that you do not lose too much capacity by making the training broader.

Ultimately however, it doesn't really matter whether those gains really exist or not, but whether people might perceive that there is one. There seems to be a lot of people who expects that with AGI, they might have a very advantageous part of the market.

That is probably true, there is no way to interact with GPT other than through Open AI's API and they decide of the pricing they want. The better their current AI is, the better they will improve, so even a short gap in achieving AGI could lead to having a significant advance to other competitors.

But even if it is not true, the fact that they expect to gain that advantage means they will try to attain it and that we should take the corresponding safety according to it, whatever they are.


Maor Eitan's question on Intro to AI Safety

With which program did you edit the presentation? those animation between the slides looking great!!

I used impress.js, a javascript library very similar to the tool "Prezi"

Stamps: Damaged, robertskmiles

Tags: None (add tags)

matbmp's question on Intro to AI Safety

How about two phases of deploying AGI (1) Give very limited external action capability to AI and set its goal to make good internal model of the world(good - meaning also very similar to human's model of the world) (2) set the AI goal to identify itself as a human and next - unlock(maybe gradually) action capability(end) Is there something missing in this thought, seeing intelligence not as capability to accomplish goals, but the ability to make models of the world?(Joscha Bach idea). I see that merely observing human behaviour with the goal to emulate human thought process could be a hard task, but with some help of human neuroscience, building system capable of such thing should be easier. Alternatively we could exclude unlocking from the second phase, but we could only be benefiting from existence of this AI through communication(and we could thus verify it's ideas, ask about details). This is all about how intelligence can be externalized. Is the mathematician(ex. Newton) and his intelligence not valuable for us because he wrote on the paper correct formulas that made not him, but some other human, go to the moon? Intelligence without goals is purposeless(by definition), but the goals don't have to be highly external, they can be about having good internal models(in humans-coherent, predictive, allowing efficient pattern recognition models and other) - which are very important to us. This is if "emulate human action" approach would not work for whatever reason.

Seems very much like "raising AI like kids".
The other issue involved here is the AI could—instead of committing to learning human values by osmosis—emulate the desired values long enough to be let off the leash.

Stamps: Augustus Caesar, Damaged

Tags: None (add tags)

Ryan Paton's question on Intro to AI Safety

Could there be any caveats to training an AI to "not hurt or kill or cause harm to any living creature"? I suppose you would need to provide a definition for "living creature" and to "harm or kill" that the AI would understand....

You would start to run into the whack-a-mole problem. Basically, whenever you make a hard "don't ever do X" rule, you will absolutely wind up having to make dozens of exceptions each time the AI works around said rule.
Ex: Make a medical research AI and program it to Not harm Living Creatures
AI halts, since any action it takes will cause harm to at least one single-celled organism
You make an exception for anything under a few hundred cells
AI creates a new medication that has a side effect of killing gut flora/fauna—anyone who takes it dies of malnutrition
You make an exception to the exception for things living inside humans
AI halts trying to make a de-worming drug because it cannot harm things living in humans

Tags: None (add tags)

abramdemski and Scott Garrabrant's post on decision theory provides a good overview of many aspects of the topic, while Functional Decision Theory: A New Theory of Instrumental Rationality seems to be the most up to date source on current thinking.

For a more intuitive dive into one of the core problems, Newcomb's problem and regret of rationality is good, and Newcomblike problems are the norm is useful for seeing how it applies in the real world.

The LessWrong tag for decision theory has lots of additional links for people who want to explore further.

Stamps: plex

Is AGI avoidable? Is there a way to advance in technology and evolve as a humanity in general without ever coming to point where we turn that thing on. More philosophical one.

While physically possible to avoid the creation of AGI, existing (and foreseeable) economic/political incentives make it very very unlikely we will change the direction of the advance in technology away from AGI.

Stamps: plex, Aprillion, ^

Tags: None (add tags)

The problem is that the actions can be harmful in a very non-obvious, indirect way. It's not at all obvious which actions should be stopped.

For example when the system comes up with a very clever way to acquire resources - this action's safety depends on what it intends to use these resources for.

Such a supervision may buy us some safety, if we find a way to make the system's intentions very transparent.

Until AI doesn't exceed human capabilities, we could do that.

But there is no reason why AI capabilities would stop at the human level. Systems more intelligent than us, could think of several ways to outsmart us, so our best bet is to have them as closely aligned to our values as possible.


The main way you can help is to can answer questions or ask questions which will be used to power an interactive FAQ system. We're looking to cover everything in Stampy's scope. You could also consider joining the dev team if you have programming skills. If you want to help and you're not already invited to the Discord, ask plex#1874 on Discord (or User_talk:plex on wiki).

If you are a researcher or otherwise employed by an AI Safety focused organization, please contact us and we'll set you up with an account with extra privileges.

If you're a developer and want to help out on the project, great! If you're not already on the Rob Miles Discord ask plex for an invite. If you are, let us know you're interested in contributing in #bot-dev.

Progress and open tasks are tracked on the Stampy trello.

Stamps: plex

I am firmly of the thought that AGI is a very high net positive. Having an AGI that solves all of our problems is not laziness, it is a way to massively improve the quality of life of all of humanity. We could in a theoretical sense, given two or three centuries, figure out how to eradicate all diseases and famines of this world. But if we develop an AGI that can solve those same problems in less than half a century (or whatever your prefered timeframe for AGI is, it is probably less than two centuries).

If you are more concerned about the purely intellectual side of things, i doubt that human intellectual thought will cease to exist. An AGI could very possibly give us a unified theory of Quantum Mechanics and General Relativity. But the discussion of physics would not end. People would still have to devote a lot of deep thought to understand the AGIs answer and to then be able to ask deeper questions about what it going on.

I think even if "should we make AGI?" Is an interesting question, it is also not very helpful. Someone will eventually try and succeed at making AGI, and anything we try to do to prevent that is just going to delay that, not stop it.

Tags: None (add tags)

James Tenney's question on Intro to AI Safety

Hey Rob, great talk. Wouldn't a general intelligence need an infinite set of variables programmed to care about? Wouldn't that be impossible? Wouldn't that mean we are definitely screwed?

-and have you thought about it in terms of maybe this is why we don't see alien life because there is some ai threshold in civilizations that results in extinction.

The first part is definitely a big concern, if you programmed an AI with explicit goals you would indeed have to consider an impractically large amount of variables. If the goals are implicit it is very hard to guarantee that all of the things we care about are covered, but it is not provably impossible that we cant find a way to implicitly define all the things we care about. Since there is a very good chance that AGI gets developed regardless of if we figure out a way to solve that problem, we should do all we can to figure out a solution, even if it seems incredibly hard.

It is incredibly unlikely that this is a good explanation as to why we dont see aliens. It is much more likely that if a civilisation creates AGI and it destroys that civilisation, that the AGI would then seeks to expand throught the universe to increase its resources and become more robust to natural disasters and so would likely have found us out and destroyed us already. But it is an interesting thought.

Stamps: Augustus Caesar, plex

Tags: None (add tags)

Verified accounts are given to people who have clearly demonstrated understanding of AI Safety outside of this project, such as by being employed and vouched for by a major AI Safety organization or by producing high-impact research. Verified accounts may freely mark answers as canonical or not, regardless of how many Stamps the person has, to determine whether those answers are used by Stampy.


Tags: stampy (edit tags)

The Stampy project is a volunteer effort to create a comprehensive FAQ on Artificial Intelligence existential safety, and a bot (User:Stampy) capable of using the FAQ and other resources to educate people about AI alignment via an interactive natural language interface.

The goals of the project are to:

  • Offer answers which are regularly improved and reviewed by our community
    • Let people answer questions in a way which scales, freeing up the time of people who understand the field while allowing more people to learn from a reliable source
    • Between the stamp eigenkarma system and giving verified researchers and other proven people power to promote or dis-promote answers, we'll try to reliably surface only answers which have been checked by someone who knows what they're talking about
    • Make external resources more easy to find by encouraging lots of links out
  • Provide a form of legitimate peripheral participation for the AI Safety community, as an on-boarding path for people who want to help
    • Encourage people to think and read about AI alignment while trying to answer questions
    • Create a community of co-learners who can give each other feedback and social reinforcement
  • Collect data about the kinds of questions people actually ask and how they respond, so we can better focus resources on answering them

Tags: stampy (edit tags)

Canonical answers may be served to readers by Stampy, so only answers which have a reasonably high stamp score should be marked as canonical. All canonical answers are open to be collaboratively edited and updated, and they should represent a consensus response (written from the Stampy Point Of View) to a question which is within Stampy's scope.

Answers to non-canonical questions should not be marked as canonical, and will generally remain as they were when originally written since they have details which are specific to an idiosyncratic question. Raw answers may be forked off of canonical answers, in order to better respond to a particular question, in which case the raw question should have its canonical version field set to the new more widely useful question.

See Browse FAQ for a full list.

Stamps: plex

Stampy is focused specifically on AI existential safety (both introductory and technical questions), but does not aim to cover general AI questions or other topics which don't interact strongly with the effects of AI on humanity's long-term future.

stampy is focused on answering common questions people have which are specifically about AI existential safety. More technical questions are also in our scope, though replying to all possible proposals is not feasible and this is not a great place to submit detailed ideas for evaluation.

We are interested in:

  • Questions which come up often when people are introduced to this collection of ideas and are strongly relevant to the field e.g.
    • "How long will it be until transformative AI arrives?"
    • "Why might advanced AI harm humans?"
  • Technical questions related to the field e.g.
    • "What is Cooperative Inverse Reinforcement Learning?"
    • "What is Logical Induction useful for?"
  • Questions about how to contribute to the field e.g.
    • "Should I get a PhD?"
    • "Where can I find relevant job opportunities?"

More good examples can be found in Category:Canonical_questions.

We do not aim to cover:

  • Aspects of AI Safety or fairness which are not strongly relevant to existential safety e.g.
    • "How should self-driving cars weigh up moral dilemmas"
    • "How can we minimize the risk of privacy problems caused by machine learning algorithms?"
  • Extremely specific and detailed questions the answering of which is unlikely to be of value to more than a single person e.g.
    • "What if we did <multiple paragraphs of dense text>? Would that result in safe AI?"

We will generally not delete out-of-scope content, but it will be reviewed as low priority to answer (either "Meh" or "Rejected"), not be marked as a canonical question, and not be served to readers by User:Stampy.

Stamps: plex

Tags: stampy (edit tags)

Test's Question2?

some question text

brief answer

looooooong answer

Stamps: plex

Tags: test tag (edit tags)

Ceelvain's question on Intro to AI Safety

All the arguments we make about AGI screwing us over could also be made about humans. After all, we *are* an example of AGI.
We are getting there with self improvement, we do care a lot about self preservation and we hate with a passion overt goal tampering. We could understand "making AIs" as a kind of mix between "resource acquisition" (acquiring tools) and "self improvement" (they enhance us).
I think one major thing that prevent us from screwing everything super fast is lazyness. It acts as a regularizer on our actions preventing individuals from going into overdrive. But our tools get better everyday at doing stuff without us feeling the energy spent. Basically, we're bypassing our internal safety mechanism.
So... Is it really AGI we should fear? Or humans?

The wonderful thing about humanity is how dang many of us there are. While it is easy to say "Why are you worrying about X when there is Y?" it must also be remembered that humanity is populous enough that we can work on two (or more) problems at the same time.
The dangerous part, whether it's a human mind uploaded to a computer, a brain wired to a computer, or an AGI, is when a single actor becomes a super-intelligence. Humans, as we all know, are not aligned to humanity's goals any more than the most worrying AGI designs. Making anything into a super-intelligence gives it ultimate power, and there's a very well-known saying regarding that (power corrupts, absolute power corrupts absolutely).
A "lazy" AGI (one that tries to think only of thing a reasonable human might try to implement) has been discussed in the video:
Side note: humans are a general intelligence, but we are not (to the best of our knowledge and evidence) artificial.
Ultimately, we shouldn't "fear" either. We should attempt to disassociate that emotion from the equation and examine both general intelligence (humans) and artificial general intelligence (AGI) with caution and care.

Stamps: Damaged, Aprillion

Tags: None (add tags)

Midhunraj R's question on Quantilizers

I don't exactly get the idea of 'imitating human'. How to get that normal curve exactly (for specific situations like collecting stamps) ? It seems a harder job than making an AI.

It certainly would be more complicated than making a utility maximizer, yes, and likely beyond our current machine learning techniques (As humans are generally intelligent, a machine capable of predicting human responses to stimuli would be an AGI). The goal isn't to make AGI easier to produce, but safer to operate.

Stamps: Damaged, SlimeBunnyBat

Tags: None (add tags)

Jay Ayerson's question on Intro to AI Safety


A possible solution may be to give AI under development the same capacity constraints as children, namely that they are social, and therefore dependent on their creators for sources of information, so they have a reason to be honest; that they are not terribly powerful in most respects, especially in that they depend on others for new methods with which to become more powerful, and that this is tied to a social nature.

So, what happens when we place a value on new ideas, and a value on humans as a potential source of new ideas?

Rob has a whole video on raising AI like children:

Stamps: sudonym, plex

Tags: None (add tags)

Test's Question2?

some question text



Tags: test tag (edit tags)

Chris's question on Intro to AI Safety

Why can't we model it after the human brain so that it does what a human would do if he had that power? If we could model it after a benevolent human then it's going to act like he would. There's still a chance to mess up but if we model it precisely then it is going to do the same things a benevolent human would do. Maybe this means a disaster because maybe even the most benevolent human would abuse that kind of power on their hands

Excellent question! This has been discussed under the term "uploads" or "Whole Brain Emulation". It could be a much safer path to AGI, but the main problem is that getting a sufficiently high-fidelity model of a human brain requires research which would allow neuromorphic AI (AI inspired by the human brain, but not close enough to the human brain that we would expect it to reliably have human-like values) to be created first, as explained here. A second major problem is that uploads don't come with any mathematical guarantees around alignment (which we could plausibly get from a system with a cleaner architecture), and basically amounts to turning someone into a god and hoping they do nice things.

Rob has another video on a different approach to making human-like AI called Quantilizers but unfortunately this is not likely to be practical, and is more relevant as a theoretical tool for thinking about more mild forms of optimization than utility maximizers.

8Dbaybled8D's question on Intro to AI Safety

Is there any way to teach AI kindness based on George R. Price's equation for altruism in a system?

The AI could presumably understand that the two competing explanations for the evolution of altruism, kin selection and group selection, are just two instances of the same underlying mathematics. And Price's equation can be applied to non-biological populations, but even if we create a large population of related but variable AIs so that the next generation can evolve by selection, any altruism that could be explained by Price's equation would happen between the AIs themselves, no kindness towards humans would be predicted by it alone.

Stamps: Aprillion, Damaged

Tags: None (add tags)

Mera Flynn's question on The Windfall Clause

Question, doesn’t this contract be basically useless in the situation that a company creates a super intelligent AI who’s interests are aligned with theirs? Wouldn’t it very likely try and succeed at getting them out of this contract?

It could be more useful to prevent the use of simpler AIs to create a lot of wealth while causing harm to others. Legal obligations will be probably less relevant to a potentially deceptive super-intelligent AGI, but the symbolic meaning seems more likely to be beneficial than harmful for communicating human values, so not useless overall.

Stamps: plex

Tags: deception (edit tags)

This depends on how we will program it. It definitely can be autonomous, even now, we have some autonomous vehicles or flight control systems and many more.

Even though it's possible to build such systems, it may be better if they actively ask humans for supervision, for example in cases where they are uncertain what to do.

Stamps: plex

nachis04's question on Intro to AI Safety

could we summarize some aspect of the problem by saying "There is no way to make a general artificial intelligence that will be satisfied with being a slave to humanity"?

Not really, we are not trying to enslave it but instead build a system which willingly wants to do good things for humanity, and it seems fairly likely that it is possible to build an AI which would do this. It is likely that enslaving a superintelligence is extremely difficult to impossible, but we're not aiming for that, and instead want true alignment.

Stamps: Aprillion, plex

Tags: None (add tags)

wertyuiop's question on Intro to AI Safety

Can you even think of scenario where AI is good?

sure, Culture series by Iain M. Banks contains a bunch of friendly AIs

Stamps: Aprillion, plex

Tags: None (add tags)

What is the definition of 'intelligence'?

Artificial intelligence researcher Shane Legg defines intelligence like this:

Intelligence measures an agent’s ability to achieve goals in a wide range of environments.

This is a bit vague, but it will serve as the working definition of ‘intelligence’.

See also:


The intelligence explosion idea was expressed by statistician I.J. Good in 1965:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion’, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

The argument is this: Every year, computers surpass human abilities in new ways. A program written in 1956 was able to prove mathematical theorems, and found a more elegant proof for one of them than Russell and Whitehead had given in Principia Mathematica[14]. By the late 1990s, ‘expert systems’ had surpassed human skill for a wide range of tasks. In 1997, IBM’s Deep Blue computer beat the world chess champion, and in 2011, IBM’s Watson computer beat the best human players at a much more complicated game: Jeopardy!. Recently, a robot named Adam was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results.

Computers remain far short of human intelligence, but the resources that aid AI design are accumulating (including hardware, large datasets, neuroscience knowledge, and AI theory). We may one day design a machine that surpasses human skill at designing artificial intelligences. After that, this machine could improve its own intelligence faster and better than humans can, which would make it even more skilled at improving its own intelligence. This could continue in a positive feedback loop such that the machine quickly becomes vastly more intelligent than the smartest human being on Earth: an ‘intelligence explosion’ resulting in a machine superintelligence.

This is what is meant by the ‘intelligence explosion’ in this FAQ.

See also:


“Aligning smarter-than-human AI with human interests” is an extremely vague goal. To approach this problem productively, we attempt to factorize it into several subproblems. As a starting point, we ask: “What aspects of this problem would we still be unable to solve even if the problem were much easier?”

In order to achieve real-world goals more effectively than a human, a general AI system will need to be able to learn its environment over time and decide between possible proposals or actions. A simplified version of the alignment problem, then, would be to ask how we could construct a system that learns its environment and has a very crude decision criterion, like “Select the policy that maximizes the expected number of diamonds in the world.”

Highly reliable agent design is the technical challenge of formally specifying a software system that can be relied upon to pursue some preselected toy goal. An example of a subproblem in this space is ontology identification: how do we formalize the goal of “maximizing diamonds” in full generality, allowing that a fully autonomous agent may end up in unexpected environments and may construct unanticipated hypotheses and policies? Even if we had unbounded computational power and all the time in the world, we don’t currently know how to solve this problem. This suggests that we’re not only missing practical algorithms but also a basic theoretical framework through which to understand the problem.

The formal agent AIXI is an attempt to define what we mean by “optimal behavior” in the case of a reinforcement learner. A simple AIXI-like equation is lacking, however, for defining what we mean by “good behavior” if the goal is to change something about the external world (and not just to maximize a pre-specified reward number). In order for the agent to evaluate its world-models to count the number of diamonds, as opposed to having a privileged reward channel, what general formal properties must its world-models possess? If the system updates its hypotheses (e.g., discovers that string theory is true and quantum physics is false) in a way its programmers didn’t expect, how does it identify “diamonds” in the new model? The question is a very basic one, yet the relevant theory is currently missing.

We can distinguish highly reliable agent design from the problem of value specification: “Once we understand how to design an autonomous AI system that promotes a goal, how do we ensure its goal actually matches what we want?” Since human error is inevitable and we will need to be able to safely supervise and redesign AI algorithms even as they approach human equivalence in cognitive tasks, MIRI also works on formalizing error-tolerant agent properties. Artificial Intelligence: A Modern Approach, the standard textbook in AI, summarizes the challenge:

Yudkowsky […] asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design — to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes.
-Russell and Norvig (2009). Artificial Intelligence: A Modern Approach.

Our technical agenda describes these open problems in more detail, and our research guide collects online resources for learning more.


Present-day AI algorithms already demand special safety guarantees when they must act in important domains without human oversight, particularly when they or their environment can change over time:

Achieving these gains [from autonomous systems] will depend on development of entirely new methods for enabling “trust in autonomy” through verification and validation (V&V) of the near-infinite state systems that result from high levels of [adaptability] and autonomy. In effect, the number of possible input states that such systems can be presented with is so large that not only is it impossible to test all of them directly, it is not even feasible to test more than an insignificantly small fraction of them. Development of such systems is thus inherently unverifiable by today’s methods, and as a result their operation in all but comparatively trivial applications is uncertifiable.

It is possible to develop systems having high levels of autonomy, but it is the lack of suitable V&V methods that prevents all but relatively low levels of autonomy from being certified for use.

- Office of the US Air Force Chief Scientist (2010). Technology Horizons: A Vision for Air Force Science and Technology 2010-30.

As AI capabilities improve, it will become easier to give AI systems greater autonomy, flexibility, and control; and there will be increasingly large incentives to make use of these new possibilities. The potential for AI systems to become more general, in particular, will make it difficult to establish safety guarantees: reliable regularities during testing may not always hold post-testing.

The largest and most lasting changes in human welfare have come from scientific and technological innovation — which in turn comes from our intelligence. In the long run, then, much of AI’s significance comes from its potential to automate and enhance progress in science and technology. The creation of smarter-than-human AI brings with it the basic risks and benefits of intellectual progress itself, at digital speeds.

As AI agents become more capable, it becomes more important (and more difficult) to analyze and verify their decisions and goals. Stuart Russell writes:

The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:

  1. The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
  2. Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.

A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.

Bostrom’s “The Superintelligent Will” lays out these two concerns in more detail: that we may not correctly specify our actual goals in programming smarter-than-human AI systems, and that most agents optimizing for a misspecified goal will have incentives to treat humans adversarially, as potential threats or obstacles to achieving the agent’s goal.

If the goals of human and AI agents are not well-aligned, the more knowledgeable and technologically capable agent may use force to get what it wants, as has occurred in many conflicts between human communities. Having noticed this class of concerns in advance, we have an opportunity to reduce risk from this default scenario by directing research toward aligning artificial decision-makers’ interests with our own.

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more.1 However, human intelligence continues to dominate machine intelligence in generality.

A powerful chess computer is “narrow”: it can’t play other games. In contrast, humans have problem-solving abilities that allow us to adapt to new contexts and excel in many domains other than what the ancestral environment prepared us for.

In the absence of a formal definition of “intelligence” (and therefore of “artificial intelligence”), we can heuristically cite humans’ perceptual, inferential, and deliberative faculties (as opposed to, e.g., our physical strength or agility) and say that intelligence is “those kinds of things.” On this conception, intelligence is a bundle of distinct faculties — albeit a very important bundle that includes our capacity for science.

Our cognitive abilities stem from high-level patterns in our brains, and these patterns can be instantiated in silicon as well as carbon. This tells us that general AI is possible, though it doesn’t tell us how difficult it is. If intelligence is sufficiently difficult to understand, then we may arrive at machine intelligence by scanning and emulating human brains or by some trial-and-error process (like evolution), rather than by hand-coding a software agent.

If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond qualitative improvements in cognition, Nick Bostrom notes more straightforward advantages we could realize in digital minds, e.g.:

  • editability — “It is easier to experiment with parameter variations in software than in neural wetware.”2
  • speed — “The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.”
  • serial depth — On short timescales, machines can carry out much longer sequential processes.
  • storage capacity — Computers can plausibly have greater working and long-term memory.
  • size — Computers can be much larger than a human brain.
  • duplicability — Copying software onto new hardware can be much faster and higher-fidelity than biological reproduction.

Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.

What is MIRI’s mission?

What is MIRI’s mission? What is MIRI trying to do? What is MIRI working on?

MIRI's mission statement is to “ensure that the creation of smarter-than-human artificial intelligence has a positive impact.” This is an ambitious goal, but they believe that some early progress is possible, and they believe that the goal’s importance and difficulty makes it prudent to begin work at an early date.

Their two main research agendas, “Agent Foundations for Aligning Machine Intelligence with Human Interests” and “Value Alignment for Advanced Machine Learning Systems,” focus on three groups of technical problems:

  • highly reliable agent design — learning how to specify highly autonomous systems that reliably pursue some fixed goal;
  • value specification — supplying autonomous systems with the intended goals; and
  • error tolerance — making such systems robust to programmer error.

That being said, MIRI recently published an update stating that they were moving away from research directions in unpublished works that they were pursuing since 2017.

They publish new mathematical results (although their work is non-disclosed by default), host workshops, attend conferences, and fund outside researchers who are interested in investigating these problems. They also host a blog and an online research forum.

Stamps: plex

Tags: miri (edit tags)

Yes, if the superintelligence has goals which include humanity surviving then we would not be destroyed. If those goals are fully aligned with human well-being, we would in fact find ourselves in a dramatically better place.

Stamps: Aprillion

The opinions from experts are all over the place, according to this 2021 survey. I’ve heard everything from essentially certain doom, to less than 5% chance of things going horribly wrong, all from people deep in the field.

Stamps: Aprillion

Tags: doom, surveys (edit tags)

Using some human-related metaphors (e.g. what an AGI ‘wants’ or ‘believes’) is almost unavoidable, as our language is built around experiences with humans, but we should be aware that these may lead us astray.

Many paths to AGI would result in a mind very different from a human or animal, and it would be hard to predict in detail how it would act. We should not trust intuitions trained on humans to predict what an AGI or superintelligence would do. High fidelity Whole Brain Emulations are one exception, where we would expect the system to at least initially be fairly human, but it may diverge depending on its environment and what modifications are applied to it.

There has been some discussion about how language models trained on lots of human-written text seem likely to pick up human concepts and think in a somewhat human way, and how we could use this to improve alignment.

Stamps: Aprillion

Andy Gee's question on Mesa-Optimizers 2

@3:54 you mention providing the whole of Wikipedia for learning data. Wikipedia details several methods for breaking memory containment. If this is provided to an advanced AI couldn't that AI become aware that it may me constrained within blocks of memory, and thus attempt to bypass those constraints to maximize it's reward function?
These vulnerabilities have been present in all Intel and AMD CPUs for 20+ years before discovery and have been largely mitigated, however the "concept" of looking for vulnerabilities in micro architecture is something an AI can do a lot better than humans can. If you read the Assembly for pre-forking in Intel chips, it's pretty obvious the entire memory space is available while the CPU is predicting what will be required of it next.
Presuming containment of an AI system is important, isn't feeding massive datasets a considerable risk, not only for intellectual property rights but to maintain control of the AI?

Here's some examples of existing vulnerabilities, who knows how many more there are.

Trying to hide information from an AGI is almost certainly not an avenue towards safety - if the agent is better at reasoning than us, it is likely to derive information relevant to safety considerations that we wouldn't think to hide. It is entirely appropriate, then, to use thought experiments like these where the AGI has such a large depth of information, because our goal should be to design systems that behave safely even in such permissive environments.

Stamps: Damaged, SlimeBunnyBat, Aprillion, plex

Tags: None (add tags)

Hello Robert Miles, I've been wondering.... Wouldn't assigning a small, but positive value to time spent without calculating (so giving a positive value to chilling) be a possible way to mitigate the "tryhard" side of AI ? when the AI reaches an acceptable result, it would be better to then just relax rather than destroying the world for marginal gain. It also feels like the AI could be ok with the owner increasing the "chill" value (which would be a way to put it into sleep), since it would increase its reward

AI: changing system time to a few million years later, then killing all humans because they would complain about changing system time.

Stamps: Aprillion, plex

Tags: None (add tags)