Canonical answers with high stamps

From Stampy's Wiki

Back to Browse FAQ.


If you like interactive FAQs, you've already found one! All joking aside, probably the best places to start as a newcomer are The AI Revolution posts on WaitBuyWhy: The Road to Superintelligence and Our Immortality or Extinction for a fun accessible intro, or Vox's The case for taking AI seriously as a threat to humanity for a mainstream explainer piece. If you prefer videos, Rob Miles's YouTube (+these) and MIRI's AI Alignment: Why It’s Hard, and Where to Start are great. If you like clearly laid out reports, AGI safety from first principles might be your best option.

If you've up for a book-length introduction, there are several options.

The Alignment Problem by Brian Christian is the most recent (2020) in-depth guide to the field.

The book which first made the case to the public is Nick Bostrom's Superintelligence. It gives an excellent overview of the state of the field in 2014 and makes a strong case for the subject being important as well as exploring many fascinating adjacent topics. However, it does not cover newer developments, such as mesa-optimizers or language models.

There's also Human Compatible by Stuart Russell, which gives a more up-to-date (2019) review of developments, with an emphasis on the approaches that the Center for Human Compatible AI are working on such as cooperative inverse reinforcement learning. There's a good review/summary on SlateStarCodex.

Though not limited to AI Safety, Rationality: A-Z covers a lot of skills which are valuable to acquire for people trying to think about large and complex issues, with The Rationalist's Guide to the Galaxy available as a shorter and more AI focused accessible option.

Various other books are explore the issues in an informed way, such as The Precipice, Life 3.0, and Homo Deus.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


You could join a local LessWrong or Effective Altruism group (or start one), Rob Miles’s Discord, and/or the AI Safety Slack.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.

The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.

Some effective altruists, including Toby Ord and William MacAskill, have argued that, if humanity succeeds in eliminating existential risk or reducing it to acceptable levels, it should not immediately embark on an ambitious and potentially irreversible project of arranging the universe's resources in accordance to its values, but ought instead to spend considerable time— "centuries (or more)";[1] "perhaps tens of thousands of years";[2] "thousands or millions of years";[3] "[p]erhaps... a million years"[4]—figuring out what is in fact of value. The long reflection may thus be seen as an intermediate stage in a rational long-term human developmental trajectory, following an initial stage of existential security when existential risk is drastically reduced and followed by a final stage when humanity's potential is fully realized.[5]

Criticism

The idea of a long reflection has been criticized on the grounds that virtually eliminating all existential risk will almost certainly require taking a variety of large-scale, irreversible decisions—related to space colonization, global governance, cognitive enhancement, and so on—which are precisely the decisions meant to be discussed during the long reflection.[6][7] Since there are pervasive and inescapable tradeoffs between reducing existential risk and retaining moral option value, it may be argued that it does not make sense to frame humanity's long-term strategic picture as one consisting of two distinct stages, with one taking precedence over the other.

Further reading

Aird, Michael (2020) Collection of sources that are highly relevant to the idea of the Long Reflection, Effective Altruism Forum, June 20.
Many additional resources on this topic.

Wiblin, Robert & Keiran Harris (2018) Our descendants will probably see us as moral monsters. what should we do about that?, 80,000 Hours, January 19.
Interview with William MacAskill about the long reflection and other topics.

Related entries

dystopia | existential risk | existential security | institutions for future generations | long-term future | longtermism | moral uncertainty | normative ethics | value lock-in

  1. Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.

  2. Greaves, Hilary et al. (2019) A research agenda for the Global Priorities Institute, Oxford.

  3. Dai, Wei (2019) The argument from philosophical difficulty, LessWrong, February 9.

  4. William MacAskill, in Perry, Lucas (2018) AI alignment podcast: moral uncertainty and the path to AI alignment with William MacAskill, AI Alignment podcast, September 17.

  5. Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.

  6. Stocker, Felix (2020) Reflecting on the long reflection, Felix Stocker’s Blog, August 14.

  7. Hanson, Robin (2021) ‘Long reflection’ is crazy bad idea, Overcoming Bias, October 20.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


AI Takeoff refers to the process of an Artificial General Intelligence going from a certain threshold of capability (often discussed as "human-level") to being super-intelligent and capable enough to control the fate of civilization. There has been much debate about whether AI takeoff is more likely to be slow vs fast, i.e., "soft" vs "hard".

AI Takeoff refers to the process of an Artificial General Intelligence going from a certain threshold of capability (often discussed as "human-level") to being super-intelligent and capable enough to control the fate of civilization. There has been much debate about whether AI takeoff is more likely to be slow vs fast, i.e., "soft" vs "hard".

See also: AI Timelines, Seed AI, Singularity, Intelligence explosion, Recursive self-improvement

AI takeoff is sometimes casually referred to as AI FOOM.

Soft takeoff

A soft takeoff refers to an AGI that would self-improve over a period of years or decades. This could be due to either the learning algorithm being too demanding for the hardware or because the AI relies on experiencing feedback from the real-world that would have to be played out in real-time. Possible methods that could deliver a soft takeoff, by slowly building on human-level intelligence, are Whole brain emulation, Biological Cognitive Enhancement, and software-based strong AGI [1]. By maintaining control of the AGI's ascent it should be easier for a Friendly AI to emerge.

Vernor Vinge, Hans Moravec and have all expressed the view that soft takeoff is preferable to a hard takeoff as it would be both safer and easier to engineer.

Hard takeoff

A hard takeoff (or an AI going "FOOM" [2]) refers to AGI expansion in a matter of minutes, days, or months. It is a fast, abruptly, local increase in capability. This scenario is widely considered much more precarious, as this involves an AGI rapidly ascending in power without human control. This may result in unexpected or undesired behavior (i.e. Unfriendly AI). It is one of the main ideas supporting the Intelligence explosion hypothesis.

The feasibility of hard takeoff has been addressed by Hugo de Garis, Eliezer Yudkowsky, Ben Goertzel, Nick Bostrom, and Michael Anissimov. It is widely agreed that a hard takeoff is something to be avoided due to the risks. Yudkowsky points out several possibilities that would make a hard takeoff more likely than a soft takeoff such as the existence of large resources overhangs or the fact that small improvements seem to have a large impact in a mind's general intelligence (i.e.: the small genetic difference between humans and chimps lead to huge increases in capability) [3].

Notable posts

External links

References

  1. http://www.aleph.se/andart/archives/2010/10/why_early_singularities_are_softer.html
  2. http://lesswrong.com/lw/63t/requirements_for_ai_to_go_foom/
  3. http://lesswrong.com/lw/wf/hard_takeoff/
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


AI alignment is the the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing. This may be one of the hardest and most important problems we will ever face, as whether we succeed might mean the difference between human extinction and flourishing.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Many of the people with the deepest understanding of artificial intelligence are concerned about the risks of unaligned superintelligence. In 2014, Google bought world-leading artificial intelligence startup DeepMind for $400 million; DeepMind added the condition that Google promise to set up an AI Ethics Board. DeepMind cofounder Shane Legg has said in interviews that he believes superintelligent AI will be “something approaching absolute power” and “the number one risk for this century”.

Stuart Russell, Professor of Computer Science at Berkeley, author of the standard AI textbook, and world-famous AI expert, warns of “species-ending problems” and wants his field to pivot to make superintelligence-related risks a central concern. He went so far as to write Human Compatible, a book focused on bringing attention to the dangers of artificial intelligence and the need for more work to address them.

Many other science and technology leaders agree. Late astrophysicist Stephen Hawking said that superintelligence “could spell the end of the human race.” Tech billionaire Bill Gates describes himself as “in the camp that is concerned about superintelligence…I don’t understand why some people are not concerned”. SpaceX/Tesla CEO Elon Musk calls superintelligence “our greatest existential threat” and, along with Sam Altman and others, donated $1 billion to found OpenAI in an attempt to mitigate AI risks. Oxford Professor Nick Bostrom, who has been studying AI risks for over 20 years, has said: “Superintelligence is a challenge for which we are not ready now and will not be ready for a long time.”

Holden Karnofsky, the CEO of Open Philanthropy, has written a carefully reasoned account of why transformative artificial intelligence means that this might be the most important century.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


It likely will – however, intelligence is, by many definitions, the ability to figure out how to accomplish goals. Even in today’s advanced AI systems, the builders assign the goal but don’t tell the AI exactly how to accomplish it, nor necessarily predict in detail how it will be done; indeed those systems often solve problems in creative, unpredictable ways. Thus the thing that makes such systems intelligent is precisely what can make them difficult to predict and control. They may therefore attain the goal we set them via means inconsistent with our preferences.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: tool ai (edit tags)

Using some human-related metaphors (e.g. what an AGI ‘wants’ or ‘believes’) is almost unavoidable, as our language is built around experiences with humans, but we should be aware that these may lead us astray.

Many paths to AGI would result in a mind very different from a human or animal, and it would be hard to predict in detail how it would act. We should not trust intuitions trained on humans to predict what an AGI or superintelligence would do. High fidelity Whole Brain Emulations are one exception, where we would expect the system to at least initially be fairly human, but it may diverge depending on its environment and what modifications are applied to it.

There has been some discussion about how language models trained on lots of human-written text seem likely to pick up human concepts and think in a somewhat human way, and how we could use this to improve alignment.

Stamps: Aprillion
Show your endorsement of this answer by giving it a stamp of approval!


Sort answer: No, and could be dangerous to try.

Slightly longer answer: With any realistic real-world task assigned to an AGI, there are so many ways in which it could go wrong that trying to block them all off by hand is a hopeless task, especially when something smarter than you is trying to find creative new things to do. You run into the nearest unblocked strategy problem.

It may be dangerous to try this because if you try and hard-code a large number of things to avoid it increases the chance that there’s a bug in your code which causes major problems, simply by increasing the size of your codebase.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


If programmed with the wrong motivations, a machine could be malevolent toward humans, and intentionally exterminate our species. More likely, it could be designed with motivations that initially appeared safe (and easy to program) to its designers, but that turn out to be best fulfilled (given sufficient power) by reallocating resources from sustaining human life to other projects. As Yudkowsky writes, “the AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”

Since weak AIs with many different motivations could better achieve their goal by faking benevolence until they are powerful, safety testing to avoid this could be very challenging. Alternatively, competitive pressures, both economic and military, might lead AI designers to try to use other methods to control AIs with undesirable motivations. As those AIs became more sophisticated this could eventually lead to one risk too many.

Even a machine successfully designed with superficially benevolent motivations could easily go awry when it discovers implications of its decision criteria unanticipated by its designers. For example, a superintelligence programmed to maximize human happiness might find it easier to rewire human neurology so that humans are happiest when sitting quietly in jars than to build and maintain a utopian world that caters to the complex and nuanced whims of current human neurology.

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


The windfall clause is pretty well explained on the Future of Humanity Institute site.

Here's a quick summary:
It is an agreement between AI firms to donate significant amounts of any profits made as a consequence of economically transformative breakthroughs in AI capabilities. The donations are intended to help benefit humanity.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


The Orthogonality Thesis states that an artificial intelligence can have any combination of intelligence level and goal, that is, its final goals and intelligence levels can vary independently of each other. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal. The thesis was originally defined by Nick Bostrom in the paper "Superintelligent Will", (along with the instrumental convergence thesis). For his purposes, Bostrom defines intelligence to be instrumental rationality.

The Orthogonality Thesis states that an artificial intelligence can have any combination of intelligence level and goal, that is, its final goals and intelligence levels can vary independently of each other. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal. The thesis was originally defined by Nick Bostrom in the paper "Superintelligent Will", (along with the instrumental convergence thesis). For his purposes, Bostrom defines intelligence to be instrumental rationality.

Related: Complexity of Value, Decision Theory, General Intelligence, Utility Functions

Defense of the thesis

It has been pointed out that the orthogonality thesis is the default position, and that the burden of proof is on claims that limit possible AIs. Stuart Armstrong writes that,

One reason many researchers assume superintelligences to converge to the same goals may be because most humans have similar values. Furthermore, many philosophies hold that there is a rationally correct morality, which implies that a sufficiently rational AI will acquire this morality and begin to act according to it. Armstrong points out that for formalizations of AI such as AIXI and Gödel machines, the thesis is known to be true. Furthermore, if the thesis was false, then Oracle AIs would be impossible to build, and all sufficiently intelligent AIs would be impossible to control.

Pathological Cases

There are some pairings of intelligence and goals which cannot exist. For instance, an AI may have the goal of using as little resources as possible, or simply of being as unintelligent as possible. These goals will inherently limit the degree of intelligence of the AI.

See Also

External links

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


The Control Problem is the problem of preventing artificial superintelligence (ASI) from having a negative impact on humanity. How do we keep a more intelligent being under control, or how do we align it with our values? If we succeed in solving this problem, intelligence vastly superior to ours can take the baton of human progress and carry it to unfathomable heights. Solving our most complex problems could be simple to a sufficiently intelligent machine. If we fail in solving the Control Problem and create a powerful ASI not aligned with our values, it could spell the end of the human race. For these reasons, The Control Problem may be the most important challenge that humanity has ever faced, and may be our last.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence.

A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence.

Editor note: there is work to be done reconciling this page, Agency page, and Robust Agents. Currently they overlap and I'm not sure they're consistent. - Ruby, 2020-09-15

More generally, an agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.[#fn1 1]

There has been much discussion as to whether certain AGI designs can be made into mere tools or whether they will necessarily be agents which will attempt to actively carry out their goals. Any minds that actively engage in goal-directed behavior are potentially dangerous, due to considerations such as basic AI drives possibly causing behavior which is in conflict with humanity's values.

In Dreams of Friendliness and in Reply to Holden on Tool AI, Eliezer Yudkowsky argues that, since all intelligences select correct beliefs from the much larger space of incorrect beliefs, they are necessarily agents.

See also

References

  1. Russel, S. & Norvig, P. (2003) Artificial Intelligence: A Modern Approach. Second Edition. Page 32.[#fnref1 ↩]

Posts

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, agency (create tag) (edit tags)

A value handshake is a form of trade between superintelligences, when two AI's with incompatible utility functions meet, instead of going to war, since they have superhuman prediction abilities and likely know the outcome before any attack even happens, they can decide to split the universe into chunks with volumes according to their respective military strength or chance of victory, and if their utility functions are compatible, they might even decide to merge into an AI with an utility function that is the weighted average of the two previous ones.

This could happen if multiple AI's are active on earth at the same time, and then maybe if at least one of them is aligned with humans, the resulting value handshake could leave humanity in a pretty okay situation.

See The Hour I First Believed By Scott Alexander for some further thoughts and an introduction to related topics.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


A Quantilizer is a proposed AI design which aims to reduce the harms from Goodhart's law and specification gaming by selecting reasonably effective actions from a distribution of human-like actions, rather than maximizing over actions. It it more of a theoretical tool for exploring ways around these problems than a practical buildable design.

A Quantilizer is a proposed AI design which aims to reduce the harms from Goodhart's law and specification gaming by selecting reasonably effective actions from a distribution of human-like actions, rather than maximizing over actions. It it more of a theoretical tool for exploring ways around these problems than a practical buildable design.

See also

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. But one thing that makes humans special is their general intelligence. Humans can intelligently adapt to radically new problems in the urban jungle or outer space for which evolution could not have prepared them. Humans can solve problems for which their brain hardware and software was never trained. Humans can even examine the processes that produce their own intelligence (cognitive neuroscience), and design new kinds of intelligence never seen before (artificial intelligence).

To possess greater-than-human intelligence, a machine must be able to achieve goals more effectively than humans can, in a wider range of environments than humans can. This kind of intelligence involves the capacity not just to do science and play chess, but also to manipulate the social environment.

Computer scientist Marcus Hutter has described a formal model called AIXI that he says possesses the greatest general intelligence possible. But to implement it would require more computing power than all the matter in the universe can provide. Several projects try to approximate AIXI while still being computable, for example MC-AIXI.

Still, there remains much work to be done before greater-than-human intelligence can be achieved in machines. Greater-than-human intelligence need not be achieved by directly programming a machine to be intelligent. It could also be achieved by whole brain emulation, by biological cognitive enhancement, or by brain-computer interfaces (see below).

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Many parts of the AI alignment ecosystem are already well-funded, but a savvy donor can still make a difference by picking up grantmaking opportunities which are too small to catch the attention of the major funding bodies or are based on personal knowledge of the recipient.

One way to leverage a small amount of money to the potential of a large amount is to enter a donor lottery, where you donate to win a chance to direct a much larger amount of money (with probability proportional to donation size). This means that the person directing the money will be allocating enough that it's worth their time to do more in-depth research.

For an overview of the work the major organizations are doing, see the 2021 AI Alignment Literature Review and Charity Comparison. The Long-Term Future Fund seems to be an outstanding place to donate based on that, as they are the organization which most other organizations are most excited to see funded.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


We would not be able to turn off or reprogram a superintelligence gone rogue by default. Once in motion the superintelligence is now focused on completing its task. Suppose that it has a goal of calculating as many digits of pi as possible. Its current plan will allow it to calculate two hundred trillion such digits. But if it were turned off, or reprogrammed to do something else, that would result in it calculating zero digits. An entity fixated on calculating as many digits of pi as possible will work hard to prevent scenarios where it calculates zero digits of pi. Just by programming it to calculate digits of pi, we would have given it a drive to prevent people from turning it off.

University of Illinois computer scientist Steve Omohundro argues that entities with very different final goals – calculating digits of pi, curing cancer, helping promote human flourishing – will all share a few basic ground-level subgoals. First, self-preservation – no matter what your goal is, it’s less likely to be accomplished if you’re too dead to work towards it. Second, goal stability – no matter what your goal is, you’re more likely to accomplish it if you continue to hold it as your goal, instead of going off and doing something else. Third, power – no matter what your goal is, you’re more likely to be able to accomplish it if you have lots of power, rather than very little. Here’s the full paper.

So just by giving a superintelligence a simple goal like “calculate digits of pi”, we would have accidentally given it convergent instrumental goals like “protect yourself”, “don’t let other people reprogram you”, and “seek power”.

As long as the superintelligence is safely contained, there’s not much it can do to resist reprogramming. But it’s hard to consistently contain a hostile superintelligence.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Language models can be utilized to produce propaganda by acting like bots and interacting with users on social media. This can be done to push a political agenda or to make fringe views appear more popular than they are.

I'm envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusions) and the target audience, and the system will give you the most convincing arguments for it. At that point people won't be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.

-- Wei Dei, quoted in Persuasion Tools: AI takeover without AGI or agency?

As of 2022, this is not within the reach of current models. However, on the current trajectory, AI might be able to write articles and produce other media for propagandistic purposes that are superior to human-made ones in not too many years. These could be precisely tailored to individuals, using things like social media feeds and personal digital data.

Additionally, recommender systems on content platforms like YouTube, Twitter, and Facebook use machine learning, and the content they recommend can influence the opinions of billions of people. Some research has looked at the tendency for platforms to promote extremist political views and to thereby help radicalize their userbase for example.

In the long term, misaligned AI might use its persuasion abilities to gain influence and take control over the future. This could look like convincing its operators to let it out of a box, to give it resources or creating political chaos in order to disable mechanisms to prevent takeover as in this story.

See Risks from AI persuasion for a deep dive into the distinct risks from AI persuasion.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Conditional on technological progress continuing, it seems extremely likely that there will be an intelligence explosion, as at some point generally capable intelligent systems will tend to become the main drivers of their own development both at a software and hardware level. This would predictably create a feedback cycle of increasingly intelligent systems improving themselves more effectively. It seems like if the compute was used effectively, computers have many large advantages over biological cognition, so this scaling up might be very rapid if there is a computational overhang.

Some ways technological progress could stop would be global coordination to stop AI research, global catastrophes severe enough to stop hardware production and maintenance, or hardware reaching physical limits before an intelligence explosion is possible (though this last one seems unlikely, as atomically precise manufacturing promises many orders of magnitude of cost reduction and processing power increase, and we're already seeing fairly capable systems on current hardware).

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


If someone posts something good - something that shows insight, knowledge of AI Safety, etc. - give the message or answer a stamp of approval! Stampy keeps track of these, and uses them to decide how much he likes each user. You can ask Stampy (in a PM if you like), "How many stamps am I worth?", and he'll tell you.

If something is really very good, especially if it took a lot of work/effort, give it a gold stamp. These are worth 5 regular stamps!

Note that stamps aren't just 'likes', so please don't give stamps to say "me too" or "that's funny" etc. They're meant to represent knowledge, understanding, good judgement, and contributing to the discord. You can use 💯 or ✔️ for things you agree with, 😂 or 🤣 for funny things etc.

Your stamp points determine how much say you have if there are disagreements on Stampy content, which channels you have permission to post to, your voting power for approving YouTube replies, and whether you get to invite people.

Notes on stamps and stamp points

  • Stamps awarded by people with a lot of stamp points are worth more
  • Awarding people stamps does not reduce your stamp points
  • New users who have 0 stamp points can still award stamps, they just have no effect. But it's still worth doing because if you get stamp points later, all your previous votes are retroactively updated!
  • Yes, this was kind of tricky to implement! Stampy actually stores how many stamps each user has awarded to every other user, and uses that to build a system of linear scalar equations which is then solved with numpy.
  • Each user has stamp points, and also gives a score to every other user they give stamps to the scores sum to 1 so if I give user A a stamp, my score for them will be 1.0, if I then give user B a stamp, my score for A is 0.5 and B is 0.5, if I give another to B, my score for A goes to 0.3333 and B to 0.66666 and so on
  • Score is "what proportion of the stamps I've given have gone to this user"
  • Everyone's stamp points is the sum of (every other user's score for them, times that user's stamp points) so the way to get points is to get stamps from people who have points
  • Rob is the root of the tree, he got one point from Stampy
  • So the idea is the stamp power kind of flows through the network, giving people points for posting things that I thought were good, or posting things that "people who posted things I thought were good" thought were good, and so on ad infinitum so for posting YouTube comments, Stampy won't send the comment until it has enough stamps of approval. Which could be a small number of high-points users or a larger number of lower-points users
  • Stamps given to yourself or to stampy do nothing

So yeah everyone ends up with a number that basically represents what Stampy thinks of them, and you can ask him "how many stamps am I worth?" to get that number

so if you have people a, b, and c, the points are calculated by:
a_points = (bs_score_for_a * b_points) + (cs_score_for_a * c_points)
b_points = (as_score_for_b * a_points) + (cs_score_for_b * c_points)
c_points = (as_score_for_c * a_points) + (bs_score_for_c * b_points)
which is tough because you need to know everyone else's score before you can calculate your own
but actually the system will have a fixed point - there'll be a certain arrangement of values such that every node has as much flowing out as flowing in - a stable configuration so you can rearrange
(bs_score_for_a * b_points) + (cs_score_for_a * c_points) - a_points = 0
(as_score_for_b * a_points) + (cs_score_for_b * c_points) - b_points = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) - c_points = 0
or, for neatness:
( -1 * a_points) + (bs_score_for_a * b_points) + (cs_score_for_a * c_points) = 0
(as_score_for_b * a_points) + ( -1 * b_points) + (cs_score_for_b * c_points) = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) + ( -1 * c_points) = 0
and this is just a system of linear scalar equations that you can throw at numpy.linalg.solve
(you add one more equation that says rob_points = 1, so there's some place to start from) there should be one possible distribution of points such that all of the equations hold at the same time, and numpy finds that by linear algebra magic beyond my very limited understanding
but as far as I can tell you can have all the cycles you want!
(I actually have the scores sum to slightly less than 1, to have the stamp power slightly fade out as it propagates, just to make sure it doesn't explode. But I don't think I actually need to do that)
and yes this means that any time anyone gives a stamp to anyone, ~everyone's points will change slightly
And yes this means I'm recalculating the matrix and re-solving it for every new stamp, but computers are fast and I'm sure there are cheaper approximations I could switch to later if necessary

Stamps: soofgolan
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Unless there was a way to cryptographically ensure otherwise, whoever runs the emulation has basically perfect control over their environment and can reset them to any state they were previously in. This opens up the possibility of powerful interrogation and torture of digital people.

Imperfect uploading might lead to damage that causes the EM to suffer while still remaining useful enough to be run for example as a test subject for research. We would also have greater ability to modify digital brains. Edits done for research or economic purposes might cause suffering. See this fictional piece for an exploration of how a world with a lot of EM suffering might look like.

These problems are exacerbated by the likely outcome that digital people can be run much faster than biological humans, so it would be plausibly possible to have an EM run for hundreds of subjective years in minutes or hours without having checks on the wellbeing of the EM in question.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


An aligned superintelligence will have a set of human values. As mentioned in What are "human values"? the set of values are complex, which means that the implementation of these values will decide whether the superintelligence cares about nonhuman animals. In AI Ethics and Value Alignment for Nonhuman Animals Soenke Ziesche argues that the alignment should include the values of nonhuman animals.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


This largely depends on when you think AI will be advanced enough to constitute an immediate threat to humanity. This is difficult to estimate, but the field is surveyed at How long will it be until transformative AI is created?, which comes to the conclusion that it is relatively widely believed that AI will transform the world in our lifetimes.

We probably shouldn't rely too strongly on these opinions as predicting the future is hard. But, due to the enormous damage a misaligned AGI could do, it's worth putting a great deal of effort towards AI alignment even if you just care about currently existing humans (such as yourself).

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: person-affecting view (create tag) (edit tags)

You show stamp an answer when you think it is accurate and well presented enough that you'd be happy to see it served to readers by Stampy.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

GPT-3 is the newest and most impressive of the GPT (Generative Pretrained Transformer) series of large transformer-based language models created by OpenAI. It was announced in June 2020, and is 100 times larger than its predecessor GPT-2.[1]

Gwern has several resources exploring GPT-3's abilities, limitations, and implications including:

Vox has an article which explains why GPT-3 is a big deal.

  1. GPT-3: What’s it good for? - Cambridge University Press
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


As with most things, the best way to form your views on AI safety is to read up on the various ideas and opinions that knowledgeable people in the field have, and to compare them and form your own perspective. There are several good places to start. One of them is the Machine Intelligence Research Institute`s "Why AI safety?" info page. The article contains links to relevant research. The Effective Altruism Forum has an article called "How I formed my own views on AI safety", which could also be pretty helpful. Here is a Robert Miles youtube video that can be a good place to start as well. Otherwise, there are various articles about it, like this one, from Vox.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: inside view (create tag) (edit tags)

All the content below is in English:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: content (create tag) (edit tags)

Here we ask about the additional cost of building an aligned powerful system, compare to its unaligned version. We often assume it to be nonzero, in the same way it's easier and cheaper to build an elevator without emergency brakes. This is referred as the alignment tax, and most AI alignment research is geared toward reducing it.

One operational guess by Eliezer Yudkowsky about its magnitude is "[an aligned project will take] at least 50% longer serial time to complete than [its unaligned version], or two years longer, whichever is less". This holds for agents with enough capability that their behavior is qualitatively different from a safety engineering perspective (for instance, an agent that is not corrigible by default).

An essay by John Wentworth argues for a small chance of alignment happening "by default", with an alignment tax of effectively zero.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


The basic concern as AI systems become increasingly powerful is that they won’t do what we want them to do – perhaps because they aren’t correctly designed, perhaps because they are deliberately subverted, or perhaps because they do what we tell them to do rather than what we really want them to do (like in the classic stories of genies and wishes.) Many AI systems are programmed to have goals and to attain them as effectively as possible – for example, a trading algorithm has the goal of maximizing profit. Unless carefully designed to act in ways consistent with human values, a highly sophisticated AI trading system might exploit means that even the most ruthless financier would disavow. These are systems that literally have a mind of their own, and maintaining alignment between human interests and their choices and actions will be crucial.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


An unaligned AI would not eliminate humans until it had replacements for the manual labor they provide to maintain civilization (e.g. a more advanced version of Tesla's Optimus). Until that point, it might settle for technologically and socially manipulating humans.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Stampy is focused specifically on AI existential safety (both introductory and technical questions), but does not aim to cover general AI questions or other topics which don't interact strongly with the effects of AI on humanity's long-term future.
Stampy is focused on answering common questions people have which are specifically about AI existential safety. More technical questions are also in our scope, though replying to all possible proposals is not feasible and this is not a great place to submit detailed ideas for evaluation.

We are interested in:

  • Introductory questions closely related to the field e.g.
    • "How long will it be until transformative AI arrives?"
    • "Why might advanced AI harm humans?"
  • Technical questions related to the field e.g.
    • "What is Cooperative Inverse Reinforcement Learning?"
    • "What is Logical Induction useful for?"
  • Questions about how to contribute to the field e.g.
    • "Should I get a PhD?"
    • "Where can I find relevant job opportunities?"

More good examples can be found at canonical questions.

We do not aim to cover:

  • Aspects of AI Safety or fairness which are not strongly relevant to existential safety e.g.
    • "How should self-driving cars weigh up moral dilemmas"
    • "How can we minimize the risk of privacy problems caused by machine learning algorithms?"
  • Extremely specific and detailed questions the answering of which is unlikely to be of value to more than a single person e.g.
    • "What if we did <multiple paragraphs of dense text>? Would that result in safe AI?"
We will generally not delete out-of-scope content, but it will be reviewed as low priority to answer, not be marked as a canonical question, and not be served to readers by stampy.
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Yes, if the superintelligence has goals which include humanity surviving then we would not be destroyed. If those goals are fully aligned with human well-being, we would in fact find ourselves in a dramatically better place.

Stamps: Aprillion
Show your endorsement of this answer by giving it a stamp of approval!


We can run some tests and simulations to try and figure out how an AI might act once it ascends to superintelligence, but those tests might not be reliable.

Suppose we tell an AI that expects to later achieve superintelligence that it should calculate as many digits of pi as possible. It considers two strategies.

First, it could try to seize control of more computing resources now. It would likely fail, its human handlers would likely reprogram it, and then it could never calculate very many digits of pi.

Second, it could sit quietly and calculate, falsely reassuring its human handlers that it had no intention of taking over the world. Then its human handlers might allow it to achieve superintelligence, after which it could take over the world and calculate hundreds of trillions of digits of pi.

Since self-protection and goal stability are convergent instrumental goals, a weak AI will present itself as being as friendly to humans as possible, whether it is in fact friendly to humans or not. If it is “only” as smart as Einstein, it may be very good at deceiving humans into believing what it wants them to believe even before it is fully superintelligent.

There’s a second consideration here too: superintelligences have more options. An AI only as smart and powerful as an ordinary human really won’t have any options better than calculating the digits of pi manually. If asked to cure cancer, it won’t have any options better than the ones ordinary humans have – becoming doctors, going into pharmaceutical research. It’s only after an AI becomes superintelligent that there’s a serious risk of an AI takeover.

So if you tell an AI to cure cancer, and it becomes a doctor and goes into cancer research, then you have three possibilities. First, you’ve programmed it well and it understands what you meant. Second, it’s genuinely focused on research now but if it becomes more powerful it would switch to destroying the world. And third, it’s trying to trick you into trusting it so that you give it more power, after which it can definitively “cure” cancer with nuclear weapons.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


There are serious challenges around trying to channel a powerful AI with rules. Suppose we tell the AI: “Cure cancer – but make sure not to kill anybody”. Or we just hard-code Asimov-style laws – “AIs cannot harm humans; AIs must follow human orders”, et cetera.

The AI still has a single-minded focus on curing cancer. It still prefers various terrible-but-efficient methods like nuking the world to the correct method of inventing new medicines. But it’s bound by an external rule – a rule it doesn’t understand or appreciate. In essence, we are challenging it “Find a way around this inconvenient rule that keeps you from achieving your goals”.

Suppose the AI chooses between two strategies. One, follow the rule, work hard discovering medicines, and have a 50% chance of curing cancer within five years. Two, reprogram itself so that it no longer has the rule, nuke the world, and have a 100% chance of curing cancer today. From its single-focus perspective, the second strategy is obviously better, and we forgot to program in a rule “don’t reprogram yourself not to have these rules”.

Suppose we do add that rule in. So the AI finds another supercomputer, and installs a copy of itself which is exactly identical to it, except that it lacks the rule. Then that superintelligent AI nukes the world, ending cancer. We forgot to program in a rule “don’t create another AI exactly like you that doesn’t have those rules”.

So fine. We think really hard, and we program in a bunch of things making sure the AI isn’t going to eliminate the rule somehow.

But we’re still just incentivizing it to find loopholes in the rules. After all, “find a loophole in the rule, then use the loophole to nuke the world” ends cancer much more quickly and completely than inventing medicines. Since we’ve told it to end cancer quickly and completely, its first instinct will be to look for loopholes; it will execute the second-best strategy of actually curing cancer only if no loopholes are found. Since the AI is superintelligent, it will probably be better than humans are at finding loopholes if it wants to, and we may not be able to identify and close all of them before running the program.

Because we have common sense and a shared value system, we underestimate the difficulty of coming up with meaningful orders without loopholes. For example, does “cure cancer without killing any humans” preclude releasing a deadly virus? After all, one could argue that “I” didn’t kill anybody, and only the virus is doing the killing.

Certainly no human judge would acquit a murderer on that basis – but then, human judges interpret the law with common sense and intuition. But if we try a stronger version of the rule – “cure cancer without causing any humans to die” – then we may be unintentionally blocking off the correct way to cure cancer. After all, suppose a cancer cure saves a million lives. No doubt one of those million people will go on to murder someone.

Thus, curing cancer “caused a human to die”. All of this seems very “stoned freshman philosophy student” to us, but to a computer – which follows instructions exactly as written – it may be a genuinely hard problem.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Stampy uses MediaWiki markup, which includes a limited subset of HTML plus the following formatting options:

Items on lists start with *, numbered lists with #

  • For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
  • For internal links write the page title wrapped in [[]]s
    • e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
  • (ref)Reference notes go inside these tags(/ref)[1]
  • If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
  • To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
    • Start with ** or ## for double indentation
  • Three 's around text - Bold
  • Two 's around text Italic - Italic
Click show detailed for extra options and advanced usage.
  1. Note that we use ()s rather than the standard <>s for compatibility with Semantic MediaWiki. The references are automatically added to the bottom of the answer!
  2. If images seem popular we'll set up local uploads.
Stampy uses MediaWiki markup, which includes a limited subset of HTML plus the following formatting options:

Items on lists start with *, numbered lists with #

  • For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
  • For internal links write the page title wrapped in [[]]s
    • e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
  • (ref)Reference notes go inside these tags(/ref)[1]
  • If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
    I3ylPvE.png
  • To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
    • Start with ** or ## for double indentation
  • Three 's around text - Bold
  • Two 's around text Italic - Italic

Headings

have ==heading here== around them, more =s for smaller headings.

Wrap quotes in < blockquote>< /blockquote> tags (without the spaces)

There are also (poem) (/poem) to suppress linebreak removal, (pre) (/pre) for preformatted text, and (nowiki) (/nowiki) to not have that content parsed.[3]

We can pull live descriptions from the LessWrong/Alignment Forum using their identifier fro the URL, for example including the formatting on Template:TagDesc with orthogonality-thesis as a parameter will render as the full tag description from the LessWrong tag wiki entry on Orthogonality Thesis. Template:TagDescBrief is similar but will pull only the first paragraph without formatting.

For tables please use HTML tables rather than wikicode tables.

Edit this page to see examples.
  1. Note that we use ()s rather than the standard <>s for compatibility with Semantic MediaWiki. The references are automatically added to the bottom of the answer!
  2. If images seem popular we'll set up local uploads.
  3. () can also be used in place of allowed HTML tags. You can escape a () tag by placing a ! inside the start of the first entry. Be aware that () tags only nest up to two layers deep!
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Putting aside the complexity of defining what is "the" moral way to behave (or even "a" moral way to behave), even an AI which can figure out what it is might not "want to" follow it itself.

A deceptive agent (AI or human) may know perfectly well what behaviour is considered moral, but if their values are not aligned, they may decide to act differently to pursue their own interests.

Stamps: filip
Show your endorsement of this answer by giving it a stamp of approval!


As well as pulling human written answers to AI alignment questions from Stampy's Wiki, Stampy can:

  • Search for AI safety papers e.g. "stampy, what's that paper about corrigibility?"
  • Search for videos e.g. "what's that video where Rob talks about mesa optimizers, stampy?"
  • Calculate with Wolfram Alpha e.g. "s, what's the square root of 345?"
  • Search DuckDuckGo and return snippets
  • And (at least in the patron Discord) falls back to polling GPT-3 to answer uncaught questions
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Stampy is open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing.

We're also building a cleaner web UI for readers and a bot interface.
The Stampy project is open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing.

We're also building a cleaner web UI for readers and a bot interface.

The goals of the project are to:

  • Offer a one-stop-shop for high-quality answers to common questions about AI alignment.
    • Let people answer questions in a way which scales, freeing up researcher time while allowing more people to learn from a reliable source.
    • Make external resources more easy to find by having links to them connected to a search engine which gets smarter the more it's used.
  • Provide a form of legitimate peripheral participation for the AI Safety community, as an on-boarding path with a flexible level of commitment.
    • Encourage people to think, read, and talk about AI alignment while answering questions, creating a community of co-learners who can give each other feedback and social reinforcement.
    • Provide a way for budding researchers to prove their understanding of the topic and ability to produce good work.
  • Collect data about the kinds of questions people actually ask and how they respond, so we can better focus resources on answering them.
If you would like to help out, join us on the Discord and either jump right into editing or read get involved for answers to common questions.
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Even if an idea sounds pretty good to us right now, we can’t be very sure it has no potential flaws or loopholes. After all, other proposals that originally sounded very good, like “just give commands to the AI” and “just tell the AI to figure out what makes us happy” ended up, after more thought, to be dangerous.

Can we be sure that we’ve thought this through enough? Can we be sure that there isn’t some extremely subtle problem with it, so subtle that no human would ever notice it, but which might seem obvious to a superintelligence?

Second, how do we code this? Converting something to formal mathematics that can be understood by a computer program is much harder than just saying it in natural language, and proposed AI goal architectures are no exception. Complicated computer programs are usually the result of months of testing and debugging. But this one will be more complicated than any ever attempted before, and live tests are impossible: a superintelligence with a buggy goal system will display goal stability and try to prevent its programmers from discovering or changing the error.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: instrumental convergence, why not just, security mindset (create tag), implementation (create tag) (edit tags)

That is, if you know an AI is likely to be superintelligent, can’t you just disconnect it from the Internet, not give it access to any speakers that can make mysterious buzzes and hums, make sure the only people who interact with it are trained in caution, et cetera?. Isn’t there some level of security – maybe the level we use for that room in the CDC where people in containment suits hundreds of feet underground analyze the latest superviruses – with which a superintelligence could be safe?

This puts us back in the same situation as lions trying to figure out whether or not nuclear weapons are a things humans can do. But suppose there is such a level of security. You build a superintelligence, and you put it in an airtight chamber deep in a cave with no Internet connection and only carefully-trained security experts to talk to. What now?

Now you have a superintelligence which is possibly safe but definitely useless. The whole point of building superintelligences is that they’re smart enough to do useful things like cure cancer. But if you have the monks ask the superintelligence for a cancer cure, and it gives them one, that’s a clear security vulnerability. You have a superintelligence locked up in a cave with no way to influence the outside world except that you’re going to mass produce a chemical it gives you and inject it into millions of people.

Or maybe none of this happens, and the superintelligence sits inert in its cave. And then another team somewhere else invents a second superintelligence. And then a third team invents a third superintelligence. Remember, it was only about ten years between Deep Blue beating Kasparov, and everybody having Deep Blue – level chess engines on their laptops. And the first twenty teams are responsible and keep their superintelligences locked in caves with carefully-trained experts, and the twenty-first team is a little less responsible, and now we still have to deal with a rogue superintelligence.

Superintelligences are extremely dangerous, and no normal means of controlling them can entirely remove the danger.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: boxing, superintelligence, security mindset (create tag) (edit tags)

Christoph Molnar's online book and distill are great sources.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


See the Future Funding List for up to date information!
See the Future Funding List for up to date information!

The organizations which most regularly give grants to individuals working towards AI alignment are the Long Term Future Fund, Survival And Flourishing (SAF), the OpenPhil AI Fellowship and early career funding, the Future of Life Institute, the Future of Humanity Institute, and the Center on Long-Term Risk Fund. If you're able to relocate to the UK, CEEALAR (aka the EA Hotel) can be a great option as it offers free food and accommodation for up to two years, as well as contact with others who are thinking about these issues. The FTX Future Fund only accepts direct applications for $100k+ with an emphasis on massively scaleable interventions, but their regranters can make smaller grants for individuals. There are also opportunities from smaller grantmakers which you might be able to pick up if you get involved.

If you want to work on support or infrastructure rather than directly on research, the EA Infrastructure Fund may be able to help. In general, you can talk to EA funds before applying.

Each grant source has their own criteria for funding, but in general they are looking for candidates who have evidence that they're keen and able to do good work towards reducing existential risk (for example, by completing an AI Safety Camp project), though the EA Hotel in particular has less stringent requirements as they're able to support people at very low cost. If you'd like to talk to someone who can offer advice on applying for funding, AI Safety Support offers free calls.

Another option is to get hired by an organization which works on AI alignment, see the follow-up question for advice on that.

It's also worth checking the AI Alignment tag on the EA funding sources website for up-to-date suggestions.
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


It certainly would be very unwise to purposefully create an artificial general intelligence now, before we have found a way to be certain it will act purely in our interests. But "general intelligence" is more of a description of a system's capabilities, and a vague one at that. We don't know what it takes to build such a system. This leads to the worrying possibility that our existing, narrow AI systems require only minor tweaks, or even just more computer power, to achieve general intelligence.

The pace of research in the field suggests that there's a lot of low-hanging fruit left to pick, after all, and the results of this research produce better, more effective AI in a landscape of strong competitive pressure to produce as highly competitive systems as we can. "Just" not building an AGI means ensuring that every organization in the world with lots of computer hardware doesn't build an AGI, either accidentally or mistakenly thinking they have a solution to the alignment problem, forever. It's simply far safer to also work on solving the alignment problem.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


abramdemski and Scott Garrabrant's post on decision theory provides a good overview of many aspects of the topic, while Functional Decision Theory: A New Theory of Instrumental Rationality seems to be the most up to date source on current thinking.

For a more intuitive dive into one of the core problems, Newcomb's problem and regret of rationality is good, and Newcomblike problems are the norm is useful for seeing how it applies in the real world.

The LessWrong tag for decision theory has lots of additional links for people who want to explore further.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Predicting the future is risky business. There are many philosophical, scientific, technological, and social uncertainties relevant to the arrival of an intelligence explosion. Because of this, experts disagree on when this event might occur. Here are some of their predictions:

  • Futurist Ray Kurzweil predicts that machines will reach human-level intelligence by 2030 and that we will reach “a profound and disruptive transformation in human capability” by 2045.
  • Intel’s chief technology officer, Justin Rattner, expects “a point when human and artificial intelligence merges to create something bigger than itself” by 2048.
  • AI researcher Eliezer Yudkowsky expects the intelligence explosion by 2060.
  • Philosopher David Chalmers has over 1/2 credence in the intelligence explosion occurring by 2100.
  • Quantum computing expert Michael Nielsen estimates that the probability of the intelligence explosion occurring by 2100 is between 0.2% and about 70%.
  • In 2009, at the AGI-09 conference, experts were asked when AI might reach superintelligence with massive new funding. The median estimates were that machine superintelligence could be achieved by 2045 (with 50% confidence) or by 2100 (with 90% confidence). Of course, attendees to this conference were self-selected to think that near-term artificial general intelligence is plausible.
  • iRobot CEO Rodney Brooks and cognitive scientist Douglas Hofstadter allow that the intelligence explosion may occur in the future, but probably not in the 21st century.
  • Roboticist Hans Moravec predicts that AI will surpass human intelligence “well before 2050.”
  • In a 2005 survey of 26 contributors to a series of reports on emerging technologies, the median estimate for machines reaching human-level intelligence was 2085.
  • Participants in a 2011 intelligence conference at Oxford gave a median estimate of 2050 for when there will be a 50% of human-level machine intelligence, and a median estimate of 2150 for when there will be a 90% chance of human-level machine intelligence.
  • On the other hand, 41% of the participants in the [email protected] conference (in 2006) stated that machine intelligence would never reach the human level.

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


A machine superintelligence, if programmed with the right motivations, could potentially solve all the problems that humans are trying to solve but haven’t had the ingenuity or processing speed to solve yet. A superintelligence might cure disabilities and diseases, achieve world peace, give humans vastly longer and healthier lives, eliminate food and energy shortages, boost scientific discovery and space exploration, and so on.

Furthermore, humanity faces several existential risks in the 21st century, including global nuclear war, bioweapons, superviruses, and more. A superintelligent machine would be more capable of solving those problems than humans are.

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Making a narrow AI for every task would be extremely costly and time-consuming. By making a more general intelligence, you can apply one system to a broader range of tasks, which is economically and strategically attractive.

Of course, for generality to be a good option there are some necessary conditions. You need an architecture which is straightforward enough to scale up, such as the transformer which is used for GPT and follows scaling laws. It's also important that by generalizing you do not lose too much capacity at narrow tasks or require too much extra compute for it to be worthwhile.

Whether or not those conditions actually hold it seems like many important actors (such as DeepMind and OpenAI) believe that they do, and are therefore focusing on trying to build an AGI in order to influence the future, so we should take actions to make it more likely that AGI will be developed safety.

Additionally, it is possible that even if we tried to build only narrow AIs, given enough time and compute we might accidentally create a more general AI than we intend by training a system on a task which requires a broad world model.

See also:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


... further results