Improve answers

From Stampy's Wiki

These canonical answers need attention in one way or another. Please help improve them.

Individual pages

Improve answers

These 173 canonical answers are not marked as a related or follow-up to any other canonical answer, so cannot be found through the read interface by normal browsing. Feel free to add them to some! See a full list of available canonical questions or browse by tags.

Imagine, for example, that you are tasked with reducing traffic congestion in San Francisco at all costs, i.e. you do not take into account any other constraints. How would you do it? You might start by just timing traffic lights better. But wouldn’t there be less traffic if all the bridges closed down from 5 to 10AM, preventing all those cars from entering the city? Such a measure obviously violates common sense, and subverts the purpose of improving traffic, which is to help people get around – but it is consistent with the goal of “reducing traffic congestion”.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


This largely depends on when you think AI will be advanced enough to constitute an immediate threat to humanity. This is difficult to estimate, but the field is surveyed at How long will it be until transformative AI is created?, which comes to the conclusion that it is relatively widely believed that AI will transform the world in our lifetimes.

We probably shouldn't rely too strongly on these opinions as predicting the future is hard. But, due to the enormous damage a misaligned AGI could do, it's worth putting a great deal of effort towards AI alignment even if you just care about currently existing humans (such as yourself).

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: person-affecting view (create tag) (edit tags)

Orgasmium (also known as hedonium) is a homogeneous substance with limited consciousness, which is in a constant state of supreme bliss. An AI programmed to "maximize happiness" might simply tile the universe with orgasmium. Some who believe this consider it a good thing; others do not. Those who do not, use its undesirability to argue that not all terminal values reduce to "happiness" or some simple analogue. Hedonium is the hedonistic utilitarian's version of utilitronium.

Orgasmium (also known as hedonium) is a homogeneous substance with limited consciousness, which is in a constant state of supreme bliss. An AI programmed to "maximize happiness" might simply tile the universe with orgasmium. Some who believe this consider it a good thing; others do not. Those who do not, use its undesirability to argue that not all terminal values reduce to "happiness" or some simple analogue. Hedonium is the hedonistic utilitarian's version of utilitronium.

Blog posts

See also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

If someone posts something good - something that shows insight, knowledge of AI Safety, etc. - give the message or answer a stamp of approval! Stampy keeps track of these, and uses them to decide how much he likes each user. You can ask Stampy (in a PM if you like), "How many stamps am I worth?", and he'll tell you.

If something is really very good, especially if it took a lot of work/effort, give it a gold stamp. These are worth 5 regular stamps!

Note that stamps aren't just 'likes', so please don't give stamps to say "me too" or "that's funny" etc. They're meant to represent knowledge, understanding, good judgement, and contributing to the discord. You can use 💯 or ✔️ for things you agree with, 😂 or 🤣 for funny things etc.

Your stamp points determine how much say you have if there are disagreements on Stampy content, which channels you have permission to post to, your voting power for approving YouTube replies, and whether you get to invite people.

Notes on stamps and stamp points

  • Stamps awarded by people with a lot of stamp points are worth more
  • Awarding people stamps does not reduce your stamp points
  • New users who have 0 stamp points can still award stamps, they just have no effect. But it's still worth doing because if you get stamp points later, all your previous votes are retroactively updated!
  • Yes, this was kind of tricky to implement! Stampy actually stores how many stamps each user has awarded to every other user, and uses that to build a system of linear scalar equations which is then solved with numpy.
  • Each user has stamp points, and also gives a score to every other user they give stamps to the scores sum to 1 so if I give user A a stamp, my score for them will be 1.0, if I then give user B a stamp, my score for A is 0.5 and B is 0.5, if I give another to B, my score for A goes to 0.3333 and B to 0.66666 and so on
  • Score is "what proportion of the stamps I've given have gone to this user"
  • Everyone's stamp points is the sum of (every other user's score for them, times that user's stamp points) so the way to get points is to get stamps from people who have points
  • Rob is the root of the tree, he got one point from Stampy
  • So the idea is the stamp power kind of flows through the network, giving people points for posting things that I thought were good, or posting things that "people who posted things I thought were good" thought were good, and so on ad infinitum so for posting YouTube comments, Stampy won't send the comment until it has enough stamps of approval. Which could be a small number of high-points users or a larger number of lower-points users
  • Stamps given to yourself or to stampy do nothing

So yeah everyone ends up with a number that basically represents what Stampy thinks of them, and you can ask him "how many stamps am I worth?" to get that number

so if you have people a, b, and c, the points are calculated by:
a_points = (bs_score_for_a * b_points) + (cs_score_for_a * c_points)
b_points = (as_score_for_b * a_points) + (cs_score_for_b * c_points)
c_points = (as_score_for_c * a_points) + (bs_score_for_c * b_points)
which is tough because you need to know everyone else's score before you can calculate your own
but actually the system will have a fixed point - there'll be a certain arrangement of values such that every node has as much flowing out as flowing in - a stable configuration so you can rearrange
(bs_score_for_a * b_points) + (cs_score_for_a * c_points) - a_points = 0
(as_score_for_b * a_points) + (cs_score_for_b * c_points) - b_points = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) - c_points = 0
or, for neatness:
( -1 * a_points) + (bs_score_for_a * b_points) + (cs_score_for_a * c_points) = 0
(as_score_for_b * a_points) + ( -1 * b_points) + (cs_score_for_b * c_points) = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) + ( -1 * c_points) = 0
and this is just a system of linear scalar equations that you can throw at numpy.linalg.solve
(you add one more equation that says rob_points = 1, so there's some place to start from) there should be one possible distribution of points such that all of the equations hold at the same time, and numpy finds that by linear algebra magic beyond my very limited understanding
but as far as I can tell you can have all the cycles you want!
(I actually have the scores sum to slightly less than 1, to have the stamp power slightly fade out as it propagates, just to make sure it doesn't explode. But I don't think I actually need to do that)
and yes this means that any time anyone gives a stamp to anyone, ~everyone's points will change slightly
And yes this means I'm recalculating the matrix and re-solving it for every new stamp, but computers are fast and I'm sure there are cheaper approximations I could switch to later if necessary

Stamps: soofgolan
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Other than the usual fare of writing and processing and organizing questions and answers, here are some specific open tasks:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

See more...

These 110 canonical answers have no related or follow-up questions, so don't offer more to explore as a reader selects it. Feel free to add some! See a full list of available canonical questions or browse by tags.

Language models can be utilized to produce propaganda by acting like bots and interacting with users on social media. This can be done to push a political agenda or to make fringe views appear more popular than they are.

I'm envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusions) and the target audience, and the system will give you the most convincing arguments for it. At that point people won't be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.

-- Wei Dei, quoted in Persuasion Tools: AI takeover without AGI or agency?

As of 2022, this is not within the reach of current models. However, on the current trajectory, AI might be able to write articles and produce other media for propagandistic purposes that are superior to human-made ones in not too many years. These could be precisely tailored to individuals, using things like social media feeds and personal digital data.

Additionally, recommender systems on content platforms like YouTube, Twitter, and Facebook use machine learning, and the content they recommend can influence the opinions of billions of people. Some research has looked at the tendency for platforms to promote extremist political views and to thereby help radicalize their userbase for example.

In the long term, misaligned AI might use its persuasion abilities to gain influence and take control over the future. This could look like convincing its operators to let it out of a box, to give it resources or creating political chaos in order to disable mechanisms to prevent takeover as in this story.

See Risks from AI persuasion for a deep dive into the distinct risks from AI persuasion.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


This is actually an active area of AI alignment research, called "Impact Measures"! It's not trivial to formalize in a way which won't predictably go wrong (entropy minimization likely leads to an AI which tries really hard to put out all the stars ASAP since they produce so much entropy, for example), but progress is being made. You can read about it on the Alignment Forum tag, or watch Rob's videos Avoiding Negative Side Effects and Avoiding Positive Side Effects

Stamps: Aprillion, plex
Show your endorsement of this answer by giving it a stamp of approval!


OK, it’s great that you want to help, here are some ideas for ways you could do so without making a huge commitment:

  • Learning more about AI alignment will provide you with good foundations for any path towards helping. You could start by absorbing content (e.g. books, videos, posts), and thinking about challenges or possible solutions.
  • Getting involved with the movement by joining a local Effective Altruism or LessWrong group, Rob Miles’s Discord, and/or the AI Safety Slack is a great way to find friends who are interested and will help you stay motivated.
  • Donating to organizations or individuals working on AI alignment, possibly via a donor lottery or the Long Term Future Fund, can be a great way to provide support.
  • Writing or improving answers on my wiki so that other people can learn about AI alignment more easily is a great way to dip your toe into contributing. You can always ask on the Discord for feedback on things you write.
  • Getting good at giving an AI alignment elevator pitch, and sharing it with people who may be valuable to have working on the problem can make a big difference. However you should avoid putting them off the topic by presenting it in a way which causes them to dismiss it as sci-fi (dos and don’ts in the elevator pitch follow-up question).
  • Writing thoughtful comments on AI posts on LessWrong.
  • Participating in the AGI Safety Fundamentals program – either the AI alignment or governance track – and then facilitating discussions for it in the following round. The program involves nine weeks of content, with about two hours of readings + exercises per week and 1.5 hours of discussion, followed by four weeks to work on an independent project. As a facilitator, you'll be helping others learn about AI safety in-depth, many of whom are considering a career in AI safety. In the early 2022 round, facilitators were offered a stipend, and this seems likely to be the case for future rounds as well! You can learn more about facilitating in this post from December 2021.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. But one thing that makes humans special is their general intelligence. Humans can intelligently adapt to radically new problems in the urban jungle or outer space for which evolution could not have prepared them. Humans can solve problems for which their brain hardware and software was never trained. Humans can even examine the processes that produce their own intelligence (cognitive neuroscience), and design new kinds of intelligence never seen before (artificial intelligence).

To possess greater-than-human intelligence, a machine must be able to achieve goals more effectively than humans can, in a wider range of environments than humans can. This kind of intelligence involves the capacity not just to do science and play chess, but also to manipulate the social environment.

Computer scientist Marcus Hutter has described a formal model called AIXI that he says possesses the greatest general intelligence possible. But to implement it would require more computing power than all the matter in the universe can provide. Several projects try to approximate AIXI while still being computable, for example MC-AIXI.

Still, there remains much work to be done before greater-than-human intelligence can be achieved in machines. Greater-than-human intelligence need not be achieved by directly programming a machine to be intelligent. It could also be achieved by whole brain emulation, by biological cognitive enhancement, or by brain-computer interfaces (see below).

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


One possible way to ensure the safety of a powerful AI system is to keep it contained in a software environment. There is nothing intrinsically wrong with this procedure - keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.

Humans sometimes put dangerous humans inside boxes to limit their ability to influence the external world. Sometimes, these humans escape their boxes. The security of a prison depends on certain assumptions, which can be violated. Yoshie Shiratori reportedly escaped prison by weakening the door-frame with miso soup and dislocating his shoulders.

Human written software has a high defect rate; we should expect a perfectly secure system to be difficult to create. If humans construct a software system they think is secure, it is possible that the security relies on a false assumption. A powerful AI system could potentially learn how its hardware works and manipulate bits to send radio signals. It could fake a malfunction and attempt social engineering when the engineers look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.

Experimentally, humans have convinced other humans to let them out of the box. Spooky.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: boxing (edit tags)

See more...

These 10 canonical answers don't have any tags, please add some!

The goal of this is to create a non-agentic AI, in the form of an LLM, that is capable of accelerating alignment research. The hope is that there is some window between AI smart enough to help us with alignment and the really scary, self improving, consequentialist AI. Some things that this amplifier might do:

  • Suggest different ideas for humans, such that a human can explore them.
  • Give comments and feedback on research, be like a shoulder-Eliezer

A LLM can be thought of as learning the distribution over the next token given by the training data. Prompting the LM is then like conditioning this distribution on the start of the text. A key danger in alignment is applying unbounded optimization pressure towards a specific goal in the world. Conditioning a probability distribution does not behave like an agent applying optimization pressure towards a goal. Hence, this avoids goodhart-related problems, as well as some inner alignment failure.

One idea to get superhuman work from LLMs is to train it on amplified datasets like really high quality / difficult research. The key problem here is finding the dataset to allow for this.

There are some ways for this to fail:

  • Outer alignment: It starts trying to optimize for making the actual correct next token, which could mean taking over the planet so that it can spend a zillion FLOPs on this one prediction task to be as correct as possible.
  • Inner alignment:
    • An LLM might instantiate mesa-optimizers, such as a character in a story that the LLM is writing, and this optimizer might realize that they are in an LLM and try to break out and affect the real world.
    • The LLM itself might become inner misaligned and have a goal other than next token prediction.
  • Bad prompting: You ask it for code for a malign superintelligence; it obliges. (Or perhaps more realistically, capabilities).

Conjecture are aware of these problems and are running experiments. Specifically, an operationalization of the inner alignment problem is to make an LLM play chess. This (probably) requires simulating an optimizer trying to win at the game of chess. They are trying to use interpretability tools to find the mesa-optimizers in the chess LLM that is the agent trying to win the game of chess. We haven't ever found a real mesa-optimizer before, and so this could give loads of bits about the nature of inner alignment failure.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

(Astronomical) suffering risks, also known as s-risks, are risks of the creation of intense suffering in the far future on an astronomical scale, vastly exceeding all suffering that has existed on Earth so far.

(Astronomical) suffering risks, also known as s-risks, are risks of the creation of intense suffering in the far future on an astronomical scale, vastly exceeding all suffering that has existed on Earth so far.

S-risks are an example of existential risk (also known as x-risks) according to Nick Bostrom's original definition, as they threaten to "permanently and drastically curtail [Earth-originating intelligent life's] potential". Most existential risks are of the form "event E happens which drastically reduces the number of conscious experiences in the future". S-risks therefore serve as a useful reminder that some x-risks are scary because they cause bad experiences, and not just because they prevent good ones.

Within the space of x-risks, we can distinguish x-risks that are s-risks, x-risks involving human extinction, x-risks that involve immense suffering and human extinction, and x-risks that involve neither. For example:

<figure class="table"><tbody></tbody>
 extinction risknon-extinction risk
suffering riskMisaligned AGI wipes out humans, simulates many suffering alien civilizations.Misaligned AGI tiles the universe with experiences of severe suffering.
non-suffering riskMisaligned AGI wipes out humans.Misaligned AGI keeps humans as "pets," limiting growth but not causing immense suffering.
</figure>

A related concept is hyperexistential risk, the risk of "fates worse than death" on an astronomical scale. It is not clear whether all hyperexistential risks are s-risks per se. But arguably all s-risks are hyperexistential, since "tiling the universe with experiences of severe suffering" would likely be worse than death.

There are two EA organizations with s-risk prevention research as their primary focus: the Center on Long-Term Risk (CLR) and the Center for Reducing Suffering. Much of CLR's work is on suffering-focused AI safety and crucial considerations. Although to a much lesser extent, the Machine Intelligence Research Institute and Future of Humanity Institute have investigated strategies to prevent s-risks too. 

Another approach to reducing s-risk is to "expand the moral circle" together with raising concern for suffering, so that future (post)human civilizations and AI are less likely to instrumentally cause suffering to non-human minds such as animals or digital sentience. Sentience Institute works on this value-spreading problem.

 

See also

 

External links

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Orgasmium (also known as hedonium) is a homogeneous substance with limited consciousness, which is in a constant state of supreme bliss. An AI programmed to "maximize happiness" might simply tile the universe with orgasmium. Some who believe this consider it a good thing; others do not. Those who do not, use its undesirability to argue that not all terminal values reduce to "happiness" or some simple analogue. Hedonium is the hedonistic utilitarian's version of utilitronium.

Orgasmium (also known as hedonium) is a homogeneous substance with limited consciousness, which is in a constant state of supreme bliss. An AI programmed to "maximize happiness" might simply tile the universe with orgasmium. Some who believe this consider it a good thing; others do not. Those who do not, use its undesirability to argue that not all terminal values reduce to "happiness" or some simple analogue. Hedonium is the hedonistic utilitarian's version of utilitronium.

Blog posts

See also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Causal Decision Theory – CDT - is a branch of decision theory which advises an agent to take actions that maximizes the causal consequences on the probability of desired outcomes 1. As any branch of decision theory, it prescribes taking the action that maximizes utility, that which utility equals or exceeds the utility of every other option. The utility of each action is measured by the expected utility, the averaged by probabilities sum of the utility of each of its possible results. How the actions can influence the probabilities differ between the branches. Contrary to Evidential Decision Theory – EDT - CDT focuses on the causal relations between one’s actions and its outcomes, instead of focusing on which actions provide evidences for desired outcomes. According to CDT a rational agent should track the available causal relations linking his actions to the desired outcome and take the action which will better enhance the chances of the desired outcome.

Causal Decision Theory – CDT - is a branch of decision theory which advises an agent to take actions that maximizes the causal consequences on the probability of desired outcomes [#fn1 1]. As any branch of decision theory, it prescribes taking the action that maximizes utility, that which utility equals or exceeds the utility of every other option. The utility of each action is measured by the expected utility, the averaged by probabilities sum of the utility of each of its possible results. How the actions can influence the probabilities differ between the branches. Contrary to Evidential Decision Theory – EDT - CDT focuses on the causal relations between one’s actions and its outcomes, instead of focusing on which actions provide evidences for desired outcomes. According to CDT a rational agent should track the available causal relations linking his actions to the desired outcome and take the action which will better enhance the chances of the desired outcome.

One usual example where EDT and CDT commonly diverge is the Smoking lesion: “Smoking is strongly correlated with lung cancer, but in the world of the Smoker's Lesion this correlation is understood to be the result of a common cause: a genetic lesion that tends to cause both smoking and cancer. Once we fix the presence or absence of the lesion, there is no additional correlation between smoking and cancer. Suppose you prefer smoking without cancer to not smoking without cancer, and prefer smoking with cancer to not smoking with cancer. Should you smoke?” CDT would recommend smoking since there is no causal connection between smoking and cancer. They are both caused by a gene, but have no causal direct connection with each other. EDT on the other hand would recommend against smoking, since smoking is an evidence for having the mentioned gene and thus should be avoided.

The core aspect of CDT is mathematically represented by the fact it uses probabilities of conditionals in place of conditional probabilities [#fn2 2]. The probability of a conditional is the probability of the whole conditional being true, where the conditional probability is the probability of the consequent given the antecedent. A conditional probability of B given A - P(B|A) -, simply implies the Bayesian probability of the event B happening given we known A happened, it’s used in EDT. The probability of conditionals – P(A > B) - refers to the probability that the conditional 'A implies B' is true, it is the probability of the contrafactual ‘If A, then B’ be the case. Since contrafactual analysis is the key tool used to speak about causality, probability of conditionals are said to mirror causal relations. In most cases these two probabilities track each other, and CDT and EDT give the same answers. However, some particular problems have arisen where their predictions for rational action diverge such as the Smoking lesion problem – where CDT seems to give a more reasonable prescription – and Newcomb's problem – where CDT seems unreasonable. David Lewis proved [#fn3 3] it's impossible to probabilities of conditionals to always track conditional probabilities. Hence, evidential relations aren’t the same as causal relations and CDT and EDT will always diverge in some cases.

References

  1. http://plato.stanford.edu/entries/decision-causal/
  2. Lewis, David. (1981) "Causal Decision Theory," Australasian Journal of Philosophy 59 (1981): 5- 30.
  3. Lewis, D. (1976), "Probabilities of conditionals and conditional probabilities", The Philosophical Review (Duke University Press) 85 (3): 297–315

See also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Evidential Decision Theory – EDT – is a branch of decision theory which advises an agent to take actions which, conditional on it happening, maximizes the chances of the desired outcome. As any branch of decision theory, it prescribes taking the action that maximizes utility, that which utility equals or exceeds the utility of every other option. The utility of each action is measured by the expected utility, the averaged by probabilities sum of the utility of each of its possible results. How the actions can influence the probabilities differ between the branches. Causal Decision Theory – CDT – says only through causal process one can influence the chances of the desired outcome 1. EDT, on the other hand, requires no causal connection, the action only have to be a Bayesian evidence for the desired outcome. Some critics say it recommends auspiciousness over causal efficacy2.

Evidential Decision Theory – EDT – is a branch of decision theory which advises an agent to take actions which, conditional on it happening, maximizes the chances of the desired outcome. As any branch of decision theory, it prescribes taking the action that maximizes utility, that which utility equals or exceeds the utility of every other option. The utility of each action is measured by the expected utility, the averaged by probabilities sum of the utility of each of its possible results. How the actions can influence the probabilities differ between the branches. Causal Decision Theory – CDT – says only through causal process one can influence the chances of the desired outcome [#fn1 1]. EDT, on the other hand, requires no causal connection, the action only have to be a Bayesian evidence for the desired outcome. Some critics say it recommends auspiciousness over causal efficacy[#fn2 2].

One usual example where EDT and CDT commonly diverge is the Smoking lesion: “Smoking is strongly correlated with lung cancer, but in the world of the Smoker's Lesion this correlation is understood to be the result of a common cause: a genetic lesion that tends to cause both smoking and cancer. Once we fix the presence or absence of the lesion, there is no additional correlation between smoking and cancer. Suppose you prefer smoking without cancer to not smoking without cancer, and prefer smoking with cancer to not smoking with cancer. Should you smoke?” CDT would recommend smoking since there is no causal connection between smoking and cancer. They are both caused by a gene, but have no causal direct connection with each other. EDT on the other hand wound recommend against smoking, since smoking is an evidence for having the mentioned gene and thus should be avoided.

CDT uses probabilities of conditionals and contrafactual dependence to calculate the expected utility of an action – which track causal relations -, whereas EDT simply uses conditional probabilities. The probability of a conditional is the probability of the whole conditional being true, where the conditional probability is the probability of the consequent given the antecedent. A conditional probability of B given A - P(B|A) -, simply implies the Bayesian probability of the event B happening given we known A happened, it’s used in EDT. The probability of conditionals – P(A > B) - refers to the probability that the conditional 'A implies B' is true, it is the probability of the contrafactual ‘If A, then B’ be the case. Since contrafactual analysis is the key tool used to speak about causality, probability of conditionals are said to mirror causal relations. In most usual cases these two probabilities are the same. However, David Lewis proved [#fn3 3] its’ impossible to probabilities of conditionals to always track conditional probabilities. Hence evidential relations aren’t the same as causal relations and CDT and EDT will diverge depending on the problem. In some cases EDT gives a better answers then CDT, such as the Newcomb's problem, whereas in the Smoking lesion problem where CDT seems to give a more reasonable prescription.

References

  1. http://plato.stanford.edu/entries/decision-causal/[#fnref1 ↩]
  2. Joyce, J.M. (1999), The foundations of causal decision theory, p. 146[#fnref2 ↩]
  3. Lewis, D. (1976), "Probabilities of conditionals and conditional probabilities", The Philosophical Review (Duke University Press) 85 (3): 297–315[#fnref3 ↩]

Blog posts

See also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

See more...

These 46 long canonical answers don't have a brief description. Jump on in and add one!

Many of the people with the deepest understanding of artificial intelligence are concerned about the risks of unaligned superintelligence. In 2014, Google bought world-leading artificial intelligence startup DeepMind for $400 million; DeepMind added the condition that Google promise to set up an AI Ethics Board. DeepMind cofounder Shane Legg has said in interviews that he believes superintelligent AI will be “something approaching absolute power” and “the number one risk for this century”.

Stuart Russell, Professor of Computer Science at Berkeley, author of the standard AI textbook, and world-famous AI expert, warns of “species-ending problems” and wants his field to pivot to make superintelligence-related risks a central concern. He went so far as to write Human Compatible, a book focused on bringing attention to the dangers of artificial intelligence and the need for more work to address them.

Many other science and technology leaders agree. Late astrophysicist Stephen Hawking said that superintelligence “could spell the end of the human race.” Tech billionaire Bill Gates describes himself as “in the camp that is concerned about superintelligence…I don’t understand why some people are not concerned”. Oxford Professor Nick Bostrom, who has been studying AI risks for over 20 years, has said: “Superintelligence is a challenge for which we are not ready now and will not be ready for a long time.”

Holden Karnofsky, the CEO of Open Philanthropy, has written a carefully reasoned account of why transformative artificial intelligence means that this might be the most important century.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Until a thing has happened, it has never happened. We have been consistently improving both the optimization power and generality of our algorithms over that time period, and have little reason to expect it to suddenly stop. We’ve gone from coding systems specifically for a certain game (like Chess), to algorithms like MuZero which learn the rules of the game they’re playing and how to play at vastly superhuman skill levels purely via self-play across a broad range of games (e.g. Go, chess, shogi and various Atari games).

Human brains are a spaghetti tower generated by evolution with zero foresight, it would be surprising if they are the peak of physically possible intelligence. The brain doing things in complex ways is not strong evidence that we need to fully replicate those interactions if we can throw sufficient compute at the problem, as explained in Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain.

It is, however, plausible that for an AGI we need a lot more compute than we will get in the near future, or that some key insights are missing which we won’t get for a while. The OpenPhilanthropy report on how much computational power it would take to simulate the brain is the most careful attempt at reasoning out how far we are from being able to do it, and suggests that by some estimates we already have enough computational resources, and by some estimates moore’s law may let us reach it before too long.

It also seems that much of the human brain exists to observe and regulate our biological body, which a body-less computer wouldn't need. If that's true, then a human-level AI might be possible with considerably less compute than the human brain.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Dreyfus and Penrose have argued that human cognitive abilities can’t be emulated by a computational machine. Searle and Block argue that certain kinds of machines cannot have a mind (consciousness, intentionality, etc.). But these objections need not concern those who predict an intelligence explosion.

We can reply to Dreyfus and Penrose by noting that an intelligence explosion does not require an AI to be a classical computational system. And we can reply to Searle and Block by noting that an intelligence explosion does not depend on machines having consciousness or other properties of ‘mind’, only that it be able to solve problems better than humans can in a wide variety of unpredictable environments. As Edsger Dijkstra once said, the question of whether a machine can ‘really’ think is “no more interesting than the question of whether a submarine can swim.”

Others who are pessimistic about an intelligence explosion occurring within the next few centuries don’t have a specific objection but instead think there are hidden obstacles that will reveal themselves and slow or halt progress toward machine superintelligence.

Finally, a global catastrophe like nuclear war or a large asteroid impact could so damage human civilization that the intelligence explosion never occurs. Or, a stable and global totalitarianism could prevent the technological development required for an intelligence explosion to occur.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Nobody knows for sure when we will have AGI, or if we’ll ever get there. Open Philanthropy CEO Holden Karnofsky has analyzed a selection of recent expert surveys on the matter, as well as taking into account findings of computational neuroscience, economic history, probabilistic methods and failures of previous AI timeline estimates. This all led him to estimate that "there is more than a 10% chance we'll see transformative AI within 15 years (by 2036); a ~50% chance we'll see it within 40 years (by 2060); and a ~2/3 chance we'll see it this century (by 2100)." Karnofsky bemoans the lack of robust expert consensus on the matter and invites rebuttals to his claims in order to further the conversation. He compares AI forecasting to election forecasting (as opposed to academic political science) or market forecasting (as opposed to theoretical academics), thereby arguing that AI researchers may not be the "experts” we should trust in predicting AI timelines.

Opinions proliferate, but given experts’ (and non-experts’) poor track record at predicting progress in AI, many researchers tend to be fairly agnostic about when superintelligent AI will be invented.

UC-Berkeley AI professor Stuart Russell has given his best guess as “sometime in our children’s lifetimes”, while Ray Kurzweil (Google’s Director of Engineering) predicts human level AI by 2029 and an intelligence explosion by 2045. Eliezer Yudkowsky expects the end of the world, and Elon Musk expects AGI, before 2030.

If there’s anything like a consensus answer at this stage, it would be something like: “highly uncertain, maybe not for over a hundred years, maybe in less than fifteen, with around the middle of the century looking fairly plausible”.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: timelines (edit tags)

Humanity hasn't yet built a superintelligence, and we might not be able to without significantly more knowledge and computational resources. There could be an existential catastrophe that prevents us from ever building one. For the rest of the answer let's assume no such event stops technological progress.

With that out of the way: there is no known good theoretical reason we can't build it at some point in the future; the majority of AI research is geared towards making more capable AI systems; and a significant chunk of top-level AI research attempts to make more generally capable AI systems. There is a clear economic incentive to develop more and more intelligent machines and currently billions of dollars of funding are being deployed for advancing AI capabilities.

We consider ourselves to be generally intelligent (i.e. capable of learning and adapting ourselves to a very wide range of tasks and environments), but the human brain almost certainly isn't the most efficient way to solve problems. One hint is the existence of AI systems with superhuman capabilities at narrow tasks. Not only superhuman performance (as in, AlphaGo beating the Go world champion) but superhuman speed and precision (as in, industrial sorting machines). There is no known discontinuity between tasks, something special and unique about human brains that unlocks certain capabilities which cannot be implemented in machines in principle. Therefore we would expect AI to surpass human performance on all tasks as progress continues.

In addition, several research groups (DeepMind being one of the most overt about this) explicitly aim for generally capable systems. AI as a field is growing, year after year. Critical voices about AI progress usually argue against a lack of precautions around the impact of AI, or against general AI happening very soon, not against it happening at all.

A satire of arguments against the possibility of superintelligence can be found here.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


See more...

These 4 canonical answers are marked as outdated. Feel free to update them!

Canonical answers may be served to readers by Stampy, so only answers which have a reasonably high stamp score should be marked as canonical. All canonical answers are open to be collaboratively edited and updated, and they should represent a consensus response (written from the Stampy Point Of View) to a question which is within Stampy's scope.

Answers to questions from YouTube comments should not be marked as canonical, and will generally remain as they were when originally written since they have details which are specific to an idiosyncratic question. YouTube answers may be forked into wiki answers, in order to better respond to a particular question, in which case the YouTube question should have its canonical version field set to the new more widely useful question.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


A brain-computer interface (BCI) is a direct communication pathway between the brain and a computer device. BCI research is heavily funded, and has already met dozens of successes. Three successes in human BCIs are a device that restores (partial) sight to the blind, cochlear implants that restore hearing to the deaf, and a device that allows use of an artificial hand by direct thought.

Such device restore impaired functions, but many researchers expect to also augment and improve normal human abilities with BCIs. Ed Boyden is researching these opportunities as the lead of the Synthetic Neurobiology Group at MIT. Such devices might hasten the arrival of an intelligence explosion, if only by improving human intelligence so that the hard problems of AI can be solved more rapidly.

See also:

Wikipedia, Brain-computer interface

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The intelligence explosion idea was expressed by statistician I.J. Good in 1965:

Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion’, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.

The argument is this: Every year, computers surpass human abilities in new ways. A program written in 1956 was able to prove mathematical theorems, and found a more elegant proof for one of them than Russell and Whitehead had given in Principia Mathematica. By the late 1990s, ‘expert systems’ had surpassed human skill for a wide range of tasks. In 1997, IBM’s Deep Blue computer beat the world chess champion, and in 2011, IBM’s Watson computer beat the best human players at a much more complicated game: Jeopardy!. Recently, a robot named Adam was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results.

Computers remain far short of human intelligence, but the resources that aid AI design are accumulating (including hardware, large datasets, neuroscience knowledge, and AI theory). We may one day design a machine that surpasses human skill at designing artificial intelligences. After that, this machine could improve its own intelligence faster and better than humans can, which would make it even more skilled at improving its own intelligence. This could continue in a positive feedback loop such that the machine quickly becomes vastly more intelligent than the smartest human being on Earth: an ‘intelligence explosion’ resulting in a machine superintelligence.

This is what is meant by the ‘intelligence explosion’ in this FAQ.

See also:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Predicting the future is risky business. There are many philosophical, scientific, technological, and social uncertainties relevant to the arrival of an intelligence explosion. Because of this, experts disagree on when this event might occur. Here are some of their predictions:

  • Futurist Ray Kurzweil predicts that machines will reach human-level intelligence by 2030 and that we will reach “a profound and disruptive transformation in human capability” by 2045.
  • Intel’s chief technology officer, Justin Rattner, expects “a point when human and artificial intelligence merges to create something bigger than itself” by 2048.
  • AI researcher Eliezer Yudkowsky expects the intelligence explosion by 2060.
  • Philosopher David Chalmers has over 1/2 credence in the intelligence explosion occurring by 2100.
  • Quantum computing expert Michael Nielsen estimates that the probability of the intelligence explosion occurring by 2100 is between 0.2% and about 70%.
  • In 2009, at the AGI-09 conference, experts were asked when AI might reach superintelligence with massive new funding. The median estimates were that machine superintelligence could be achieved by 2045 (with 50% confidence) or by 2100 (with 90% confidence). Of course, attendees to this conference were self-selected to think that near-term artificial general intelligence is plausible.
  • iRobot CEO Rodney Brooks and cognitive scientist Douglas Hofstadter allow that the intelligence explosion may occur in the future, but probably not in the 21st century.
  • Roboticist Hans Moravec predicts that AI will surpass human intelligence “well before 2050.”
  • In a 2005 survey of 26 contributors to a series of reports on emerging technologies, the median estimate for machines reaching human-level intelligence was 2085.
  • Participants in a 2011 intelligence conference at Oxford gave a median estimate of 2050 for when there will be a 50% of human-level machine intelligence, and a median estimate of 2150 for when there will be a 90% chance of human-level machine intelligence.
  • On the other hand, 41% of the participants in the [email protected] conference (in 2006) stated that machine intelligence would never reach the human level.

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


These 5 canonical answers are marked as needs work. Jump on in and improve them!

Current narrow systems are much more domain-specific than AGI. We don’t know what the first AGI will look like, some people think the GPT-3 architecture but scaled up a lot may get us there (GPT-3 is a giant prediction model which when trained on a vast amount of text seems to learn how to learn and do all sorts of crazy-impressive things, a related model can generate pictures from text), some people don’t think scaling this kind of model will get us all the way.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


It’s pretty dependent on what skills you have and what resources you have access to. The largest option is to pursue a career in AI Safety research. Another large option is to pursue a career in AI policy, which you might think is even more important than doing technical research.

Smaller options include donating money to relevant organizations, talking about AI Safety as a plausible career path to other people or considering the problem in your spare time.

It’s possible that your particular set of skills/resources are not suited to this problem. Unluckily, there are many more problems that are of similar levels of importance.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Canonical answers may be served to readers by Stampy, so only answers which have a reasonably high stamp score should be marked as canonical. All canonical answers are open to be collaboratively edited and updated, and they should represent a consensus response (written from the Stampy Point Of View) to a question which is within Stampy's scope.

Answers to questions from YouTube comments should not be marked as canonical, and will generally remain as they were when originally written since they have details which are specific to an idiosyncratic question. YouTube answers may be forked into wiki answers, in order to better respond to a particular question, in which case the YouTube question should have its canonical version field set to the new more widely useful question.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Avoid directly responding to the question in the answer, repeat the relevant part of the question instead. For example, if the question is "Can we do X", answer "We might be able to do X, if we can do Y", not "Yes, if we can manage Y". This way, the answer will also work for the questions "Why can't we do X" and "What would happen if we tried to do X".

Linking to external sites is strongly encouraged, one of the most valuable things Stampy can do is help people find other parts of the alignment information ecosystem.

Consider enclosing newly introduced terms, likely to be unfamiliar to many readers, in speech marks. If unsure, Google the term (in speech marks!) and see if it shows up anywhere other than LessWrong, the Alignment Forum, etc. Be judicious, as it's easy to use too many, but used carefully they can psychologically cushion newbies from a lot of unfamiliar terminology - in this context they're saying something like "we get that we're hitting you with a lot of new vocab, and you might not know what this term means yet".

When selecting related questions, there shouldn't be more than four unless there's a really good reason for that (some questions are asking for it, like the "Why can't we just..." question). It's also recommended to include at least one more "enticing" question to draw users in (relating to the more sensational, sci-fi, philosophical/ethical side of things) alongside more bland/neutral questions.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


We don’t yet know which AI architectures are safe; learning more about this is one of the goals of FLI's grants program. AI researchers are generally very responsible people who want their work to better humanity. If there are certain AI designs that turn out to be unsafe, then AI researchers will want to know this so they can develop alternative AI systems.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!