Review answers

From Stampy's Wiki

If you think an answer is good (i.e. accurate, helpful, and replies well to the question) then please give it a stamp of approval. If you're a more experienced editor and have a good feel for when there is consensus around an answer being accepted, feel free to mark answers as canonical, so that Stampy will start serving them to users.

Individual pages

These pages track 252 answers which need review (with some being counted multiple times).

Review answers

These 50 non-canonical answers are answering canonical questions.

Metaphilosophy is the philosophy of philosophy.

Bostrom has described the AI Safety problem as "philosophy with a deadline". Metaphilosophy could be used to steer an AGI towards e.g. our coherent extrapolated volition.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, metaphilosophy (create tag) (edit tags)

There is currently no clear win condition that most/all researchers agree on. Many researchers have their own paradigm and view the problem from a different angle.

Here are some of the sub-fields on AI Safety research. We need to solve the challenges in many of these fields to win.

See also Concrete Problems in AI Safety.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


In order of smallest commitment to largest:

  1. Link your friends to Stampy or Rob's videos
  2. Join or start a local AI Safety group at a university
  3. Get good at giving an elevator pitch
  4. Become a competent advocate by being convincing and have comprehensive knowledge, to answer follow-up questions
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, community (create tag) (edit tags)

Ajeya Cotra's attempted to calculate this number in her paper Bio Anchors.

[...]the total amount of computation done over the course of evolution from the first animals with neurons to humans was (~1e16 seconds) * (~1e25 FLOP/s) = ~1e41 FLOP

Nuño Sempere argues that this calculation of the computation done by neurons is insufficient as the environment would also need to be simulated, leading to a possibly much larger number.

Cotra posits that this amount of computation should be taken on an upper bound to the amount of computation needed to develop AGI. The actual amount of computation needed is probably many orders of magnitude lower.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: evolution (create tag) (edit tags)

Whilst this is posible, AI technologies seem to be progressing much faster than cognitive enhancement.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: biological cognitive enhancement (create tag) (edit tags)

Notes:

Link https://forum.effectivealtruism.org/posts/GvHPnzGJQJ7iAiJNr/on-presenting-the-case-for-

https://www.lesswrong.com/posts/8c8AZq5hgifmnHKSN/agi-safety-faq-all-dumb-questions-allowed-thread?commentId=2JiMsmu32EvzKv4yP

The future of the world will be dominated by these systems. We control the world because we're the most capable and coordinated entities on the planet.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: communication (create tag) (edit tags)

Yes you can! You can check out AI Safety Support's resources for examples on the format this could take.

If you get good reviews and actually help these researchers, you might eventually get funded by external organisations.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, productivity (create tag) (edit tags)

A good list can be found on the Alignment Forum's tag list.

Stamps: 298883259008417793
Show your endorsement of this answer by giving it a stamp of approval!


There are two parts to that answer.

Firstly: By working on the right things. Every generation since the dawn of humanity had it's Einstein-level geniuses. And yet, most of them were forgotten by history because they just didn't run into an important problem to solve.

Secondly: There are a number of useful resources for getting more productive on the internet. Some leads you might find useful:

  • 80.000 hours published an article with an extensive list of evidence-backed strategies for becoming better at any job. Start at the top, and work your way down until you find something that makes sense for you to implement.
  • For general problem-solving, the toolbox taught by CFAR (Center for Applied Rationality) has proven useful to many members of the alignment community. There are two sequences on LessWrong written as self-study guides for the CFAR tools: Hammertime, Training Regime.
  • Keep in mind that no productivity advice whatsoever works for everyone. Something might be useful for 50% of the population, or even 99%, and still leave you worse off if you try to implement it. Experiment, iterate, and above all: Trust your own judgment.
Stamps: 298883259008417793
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, productivity (create tag) (edit tags)

There are three major approaches to normative ethics (and some approaches to unify two or all of them): Virtue ethics, deontological ethics, and consequentialist ethics.

Virtue ethicists believe that at the core, leading an ethical life means cultivating virtues. In other words: What counts is less what one does moment-to-moment, but that one makes an effort to become the kind of person who habitually acts appropriately in all kinds of different situations. A prominent example for virtue ethics is stoicism.

Deontological ethicists believe that an ethical life is all about following certain behavioral rules, regardless of the consequences. Prominent examples include the ten commandments in Christianity, Kant's "categorical imperative" in philosophy, or Asimov's Three Laws of Robotics in science fiction.

Consequentialist ethicists believe that nor one's character neither the rules one lives by are what makes actions good or bad. Instead, consequentialists believe that only the consequences of an action count, both direct and indirect ones. A prominent example of consequentialist ethics is utilitarianism: The notion that those actions are the most moral that lead to the greatest good for the greatest number of individuals.

The short answer to the question which one of these might be the easiest to encode into an AI is: "We don't know." However, machine learning agents optimize for consequences, not virtues or hard-coded rules. As all the likely roads towards AGI involve machine learning, consequentialism may be the ethical theory to stick closest to.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: ethics (create tag) (edit tags)

The key concern in regards to AGI is that if it passes human-level intelligence, it would likely become uncontrollable and we essentially hand our dominant position on the planet over to it. Whether the first human-level AI is deployed by terrorists, a government, or a major research organization does not make any difference for that fact. While the latter two might have more interest in deploying aligned AGI than terrorists, they won't be able to do that unless we solve the alignment problem.

As far as narrow AI is concerned: The danger of misuse by bad actors is indeed a problem. As the capabilities of narrow AI systems grow while we get closer to AGI, this problem will only grow more and more severe over the next years and decades.

However, leading experts expect that we are more than 50% likely to reach human-level AI by the end of this century. On the forecasting platform Metaculus, the current (September 2022) median forecast is as early as 2043.

Accordingly, we have no time to lose for solving the alignment problem, with or without the danger of terrorists using narrow AI systems.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Perhaps an ai which is aligned with the values of humanity as a whole may decide that it's worth "uploading" someone contrary to their will if it better serve humanity to have us all uploaded than not to, but it seems likely that most people do not wish to be "uploaded" (See Nozick's experience machine) and so an aligned superintelligence would not do this.

A misaligned superintelligence would have no qualm with destroying us entirely in order to use the matter of our bodies, and it most likely would not bother to "upload" us since it is misaligned.

Stamps: QueenDaisy
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

It's hard to know what politicians are concerned with behind closed doors, as they will only publicly talk about things which serve their political interests, and talking about AIs destroying the world is a good way to sound crazy and to lose a lot of votes, so they won't do that.

I did search "politicians concerned about ai risk uk" on my browser and got a message saying "Not many great matches came back for your search politicians concerned about ai risk uk", which shows that if politicians are worried about this, they don't tend to talk about it.

Stamps: QueenDaisy
Show your endorsement of this answer by giving it a stamp of approval!

Tags: government (create tag) (edit tags)

It's very hard to know the answer to this in advance because a superintelligent AI is capable of doing anything that it understands how to do. That sounds obvious, but it has a hidden nuance there - it knows how to do things that we don't know how to do, and so we cannot predict in advance what it will do because it will do some things that we didn't even know were possible to do. Almost by definition, that makes it impossible to reliably predict.

We can say that it will not be able to do things which are actually not possible to do, and it seems likely that certain rules are always followed (energy must be conserved, global entropy must always increase etc.) but the problem is that we have been wrong about the laws of physics before (see Newton's laws vs relativity and quantum mechanics) and so we're only *mostly* certain that these things are true. If there's any discrepancy between how we think physics works and how physics actually works (and most likely, such a discrepancy does exist somewhere), we can expect a superintelligent AI to be able to exploit that discrepancy in order to serve its goals in lots of weird ways we couldn't possibly have predicted.

We may be able to say with 99% certainty that a superintelligent AI will not be able to violate conservation of energy, but we're not 100% sure, and we can't be 100% sure of anything here.

Stamps: QueenDaisy
Show your endorsement of this answer by giving it a stamp of approval!

Tags: physics (create tag) (edit tags)

Alignment failure, at its core, is any time an AI's output deviates from what we intended. We have already witnessed alignment failure in simple AIs. Mostly, these amount to correlation being equated to causation. A good example was an AI built by Youtube to recognize animals being forced to fight for sport. The videos given to the AI were always set in some kind of arena, so the AI drew the simplest conclusion and matched videos where there were similar arenas—such as with robot combat tournaments.

Link: https://web.archive.org/web/20210807193707/https://www.theverge.com/tldr/2019/8/20/20825858/youtube-bans-fighting-robot-videos-animal-cruelty-roughly-10-years-too-soon-ai-google

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

A lot of narrow AI alignment advances will improve capabilities too, make it easier for human users to work with the AI tools. Those are going to be adopted almost instantly, for example interpretability might be considered a desirable property by all AI researchers.

However, with our current research methods for search in highly dimensional spaces, it seems exponentially more likely to find a capable AGI than to find a capable and aligned AGI. So even if capabilities research will adopt all new advances in AI alignment as soon as they come along, it is likely capability research will happen faster than alignment research. We need to find ways to incentivize the corporations who will would profit from more capability research to also focus on alignment.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

By its very nature, a superintelligence could perform tasks we can't even imagine. If properly aligned, we could ask it to cure cancer, lower the crime rate in the world to zero, or end world hunger. A superintelligence would be able to innovate beyond our current understanding. It could even be given a task to do something as general as keeping humans safe. That superintelligence would try and find threats to us and handle them appropriately. As a last resort, if the superintelligence feels outmatched, it could develop an even more powerful superintelligence to help.

All these things would be possible and more. However, we would likely want to ensure the superintelligence is interpretable; otherwise, we could never truly trust the system is aligned with our interests.

Stamps: Aprillion, tayler6000
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Being able to prove alignment of a potential AGI to any objective expressed in human language seems to be an important stepping stone towards AGI alignment, even if it would be such a controversial objective as hedonium maximizing, which most people find undesirable.

Since hedonium maximization might be easier to model mathematically than more complex (and more desirable) objectives, it might be easier to optimize for. And development of optimization techniques with provable alignment might generalize to help us optimize for desirable objectives too.

Stamps: Damaged, Aprillion
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Emulated minds have the same behavior as conventional minds. This being so, they can do anything the human mind they are emulating can. Provided the mind that is being emulated is capable of learning how to do alignment research, the emulation of them would be, too. It should be noted that we do not currently have the technology to emulate human minds.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: whole brain emulation, research assistants (create tag) (edit tags)

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Let's call the AI in the box the prisoner and the AI outside the guard. You could imagine such AIs boxed liked nesting dolls but there would always be a highest level where the guard AI is boxed by humans. This being the case, it is not clear what such a design buys you, as all the issues of containing an AI will still apply at this top level.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


A pivotal act is an unspecified action that drastically improves the strategic situation such that successful alignment becomes significantly more probable.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

So, let’s first ignore superintelligence and consider the case of AIs that are merely perfect replacements for all human labour.

Many economists dismiss the claim that automation can cause general unemployment, and will often mention the “Lump of labour” fallacy, which is the idea that there is a finite amount of jobs in the world, which automation will slowly winnow away. They note that though automation has caused unemployment in particular sectors, increased efficiency has freed up resources which can be used to employ more humans in those areas where machines cannot yet replace their labour. And historically, this has more than made up for the jobs that were eliminated.

However, AI is different than other sorts of automation in that it is of general applicability. If we consider these AIs perfect replacements for human labour then standard labour models predict human wages will decline until they are competitive with the cost of running an AI. If this cost is below subsistence, then this would cause unemployment. However, in scenarios where some humans still have capital, they may prefer human workers for signalling or other reasons even if AIs are better and cheaper.

Now that we’ve talked about perfect replacements for human labour, we can talk about superintelligence. A superintelligence would quickly acquire material power, and we think superior material power to that held by any human or collection of humans. At that point, thinking in terms of employment is likely beside the point. The post-superintelligence world will reflect the preferences of the AI/AIs. If it prefers humans exist, we will. If it prefers we have jobs, we will.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: automation, technological unemployment (create tag) (edit tags)

Perhaps. There is a chance that directly lobbying politicians could help, but there's also a chance that actions end up being net-negative. It would be great if we could slow down AI, but doing so might simple mean that a nation less concerned about safety produces AI first. We could ask them to pass regulations or standards related to AGI, but passing ineffective regulation might interfere with passing more effective regulation later down the track as people may consider the issue dealt with. Or the requirements of complying with bureaucracy might prove to be a distraction from safe AI.

If you are concerned about this issue, you should probably try learning as much about this issue as possible and also spend a lot of time brainstorming downside risks and seeing what risks other people have identified.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: persuasion, politics (create tag) (edit tags)

Working out milestone tasks that we expect to be achieved before we reach AGI can be difficult. Some tasks, like "continuous learning" intuitively seem like they will need to be solved before someone builds AGI. Continuous learning is learning bit by bit, as you get more data. Current ML systems usually don't do this, instead learning everything at once from a big dataset. Because humans can do continuous learning, it seems like it might be required for AGI. However, you have to be careful with reasoning like this, because it is possible the first generally capable artificial intelligence will work quite differently to a human. It's possible the first AGI will be designed to avoid needing "continuous learning", maybe by being designed to do a big retraining process every day. This might still allow it to be as capable as humans at almost every task, but without solving the "continuous learning" problem.

Because of arguments like the above, it's not always clear whether a given task is "required" for AGI.

Some potential big milestone tasks might be:

  • ARC challenge (tests the ability to generate the "simplest explanation" for patterns)
  • Human level sample efficiency at various tasks (EfficientZero already does Atari games)


This metaculus question has four very specific milestones that it considers to be requirements for "weak AGI".

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

There are multiple programmes you can apply to if you want to try becoming a researcher. If accepted to these programs, you will get funding and mentorship. Some examples of these programs are: SERI summer research fellowship, CERI summer research fellowship, SERI ML Alignment Theory Program, and more. A lot of these programs run during specific times of the year (specifically during the summer).

Other examples of things you can do are: join the next iteration of the AGI Safety Fundamentals programme (https://www.eacambridge.org/technical-alignment-curriculum), if you're thinking of a career as a researcher working on AI safety questions you can get 1-1 career advice from 80,000 Hours (https://80000hours.org/speak-with-us), you can apply to attend an EAGx or EAG conference (https://www.eaglobal.org/events/) where you can meet in-person with researchers working on these questions so you can directly ask them for advice.

Some of these resources might be helpful: https://www.aisafetysupport.org/resources/lots-of-links

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, mentorship (create tag) (edit tags)

It's not completely clear exactly what 'merging' with AI would imply, but it doesn't seem like a way to get around the alignment problem. If the AI system is aligned, and wants to do what humans want, then having direct access to human brains could provide a lot of information about human values and goals very quickly and efficiently, and thus be helpful for better alignment. Although, a smart AI system could also get almost all of this information without a brain-computer interface, through conversation, observation etc, though much slower. On the other hand if the system is not aligned, and doesn't fundamentally want humans to get what we want, then extra information about how human minds work doesn't help and only makes the problem worse. Allowing a misaligned AGI direct access to your brain hardware is a bad idea for obvious reasons.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Preventing an AI from escaping by using a more powerful AI, gets points for creative thinking, but unfortunately we would need to have already aligned the first AI. Even if the second AI's only terminal goal were to prevent the first ai from escaping, it would also have an instrumental goal of converting the rest of the universe into computer chips so that it would have more processing power to figure out how to best contain the first AGI.

It might be possible to try to bind a stronger AI with a weaker AI, but this is unlikely to work as the stronger AI would have an advantage due to being stronger. Further, there is a chance that the two AI's end up working out a deal where the first AI decides to stay in the box and the second AI does whatever the first AI would have down if it were able to escape.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


One of the main questions about simulation theory is why would a society invest a large quantity of resources to create it. One possible answer is an environment to train/test AI, or run it safely isolated from an outside reality.

It's a fun question but probably not one worth thinking about too much. This kind of question is impossible to get information from observations and experiments.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: simulation hypothesis (create tag) (edit tags)

I think an AI inner aligned to optimize a utility function of maximize happiness minus suffering is likely to do something like this.

Inner aligned meaning the AI is trying to do the thing we trained it to do. Whether this is what we actually want or not.

"Aligned to what" is the outer alignment problem which is where the failure in this example is. There is a lot of debate on what utility functions are safe or desirable to maximize, and if human values can even be described by a utility function.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Autonomous weapons, especially a nuclear arsenal, being used by an AI is a concern, but this seems downstream of the central problem of giving an unaligned AI any capabilities to impact the world.

Triggering nuclear war is only one of many ways a power seeking AI might choose to take control. This seems unlikely, as resources the AI would want to control (or the AI itself) would likely be destroyed in the process.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


This depends on what the superintelligence in question wants to happen. If AIs want humans to continue being employable, they’ll act to ensure humans remain employable by setting up roles that only biological humans can fill, artificially perpetuating the need for employing humans.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: automation, technological unemployment (create tag) (edit tags)

Alignment is very broadly concerned with how to align an AI with any given set of arbitrary values. For the purposes of research, it doesn't matter what the values are—so long as we can get the AI to sincerely hold them we have succeeded at alignment. Once the problem of alignment is solved then the "human values" a given AI holds are those given to it by its creators (as there is no reason for anyone to create an AI that works against their interest)
Alignment is very broadly concerned with how to align an AI with any given set of arbitrary values. For the purposes of research, it doesn't matter what the values are—so long as we can get the AI to sincerely hold them we have succeeded at alignment. Once the problem of alignment is solved then the "human values" a given AI holds are those given to it by its creators (as there is no reason for anyone to create an AI that works against their interest).

Optimistic views might hold that it is possible to coordinate between all AI creators to align their AIs only with a central agreed-upon definition of "human values," which could be determined by traditional human political organizations. Succeeding at this coordination would prevent (or at least, reduce) the weaponization of AIs toward competition between these values.

More pessimistic views hold that this coordination is unlikely to succeed, and that just as today different definitions of "human values" compete with one another (through e.g. political conflicts), AIs will likely be constructed by actors with different values and will compete with one another on the same grounds. The exception being that this competition might end if one group gains enough advantage to carry out a Pivotal Act that can "lock-in" their set of values as winner.

We could imagine a good instance of this might look like a U.N.-sanctioned project constructing the first super-intelligent AI, successfully aligned with the human values roughly defined as "global peace and development". This AI might then perform countermeasures to reduce the influence of bad AIs by e.g. regulating further AI development, or seizing compute power from agencies developing bad AIs.

Bad outcomes might look similar to the above, but with AIs developed by extremists or terrorists taking over. Worse still would be a careless development group accidentally producing a maligned AI, where we don't end up with "bad human values" (like one of the more oppressive human moralities), we just end up with "non-human values" (like where only paperclips matter).

A common concern is that if a friendly AI doesn't carry this out, then an opposition AI is likely to do so. Hence, there is a relatively common view that safe AI not only must be developed, but must be deployed to prevent possibly hostile AIs from arising.

There are also arguments against "Pivot Act" mentality which promote political regulation as a better path toward friendly AI than leaving the responsibility to the first firm to finish.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


LessWrong (where we pull descriptions from) is missing a description for this tag, please add one.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, research agendas, agent foundations (create tag) (edit tags)

Questions are: (i) contributed by online users or Stampy editors via the Stampy Wiki; or (ii) scraped from online content (various AI-alignment-related FAQs as well as the comments sections of certain AI Alignment YouTube videos).

The scraped content is currently a secondary concern, but this crude process of aggregation will eventually be streamlined into a reliable source of human-editable questions and answers.

Questions are reviewed by Stampy editors, who decide if: (i) they're duplicates of existing questions (the criterion being that the answer to the existing question would be fully satisfactory to the asker of the new question); (ii) they're sufficiently within the scope of the Stampy project.

We are working on using semantic search to suggest possible duplicates.

If they meet these two criterion, questions are added to a list of canonical questions.

A rating system allows editors to assign quality levels "Meh"/"Unreviewed"/"Approved"/"Good"/"Excellent" in order the questions on Answer questions, so that the most important questions can be worked on first.

Answers to canonical questions can be contributed via the Stampy Wiki by online users or by Stampy editors directly, at which point the question is added to a list of "answered canonical questions"

Editors can attempt to improve a contributed answer, and/or can "stamp" it to indicate their approval, adding to its "stamp score".


Once the answer to a canonical question gets a sufficiently high stamp score it gets added to a list of canonical answers (to canonical questions).

These canonical question/answer pairs are then ready to be served to the user interface. In order for them to become visible there, though, they must be associated with existing canonical question/answer pairs in one of two ways: RELATED or FOLLOWUP. Any editor can improve these relationships, either based on tags or their own understanding of what a reader might want to know. Questions should aim to have 2-5 related + followups generally, although exceptions can be made.

If Questions B is RELATED to Question A, it will slide in below Question A on the UI page when Question A is clicked on, provided that if it is not already present on the page.

If Question B is FOLLOWUP to Question A, it will always slide in below Question A when Question A is clicked on, even if it is already present on the UI page.

A and B being RELATED questions can be thought of as a kind of conceptual adjacency. If a user is interested to know the answer to A, they'll probably be interested in the answer to B too, and vice versa. Reading these in either order should make roughly the same amount of sense to the average user.

Question B being FOLLOWUP to Question A can be thought of in terms of progressive knowledge: the answer to B will only really make sense to the average user if they have read the answer to A first. This is also used for letting Stampy ask clarifying questions to direct readers to the right part of his knowledge graph.


If you click on "Edit answer", then "[Show Advanced Options]", you'll be given the option to submit a brief version of you answer (this field will be automatically filled if the full answer exceeds 2000 characters).

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

There are debates about how discontinuous an intelligence explosion would be, with Paul Christiano expecting to see the world being transformed by less and less weak AGIs over some number of years, while Eliezer Yudkowsky expects a rapid jump in capabilities once generality is achieved and the self-improvement process is able to sustain itself.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Codex / Github Copilot are AIs that use GPT-3 to write and edit code. When given some input code and comments describing the intended function, they will write output that extends the prompt as accurately as possible.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


"The real concern" isn't a particularly meaningful concept here. Deep learning has proven to be a very powerful technology, with far reaching implications across a number of aspects of human existence. There are significant benefits to be found if we manage the technology properly, but that management means addressing a broad range of concerns, one of which is the alignment problem.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Whole Brain Emulation (WBE) or ‘mind uploading’ is a computer emulation of all the cells and connections in a human brain. So even if the underlying principles of general intelligence prove difficult to discover, we might still emulate an entire human brain and make it run at a million times its normal speed (computer circuits communicate much faster than neurons do). Such a WBE could do more thinking in one second than a normal human can in 31 years. So this would not lead immediately to smarter-than-human intelligence, but it would lead to faster-than-human intelligence. A WBE could be backed up (leading to a kind of immortality), and it could be copied so that hundreds or millions of WBEs could work on separate problems in parallel. If WBEs are created, they may therefore be able to solve scientific problems far more rapidly than ordinary humans, accelerating further technological progress.

See also:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Until AI doesn't exceed human capabilities, we could do that.

But there is no reason why AI capabilities would stop at the human level. Systems more intelligent than us, could think of several ways to outsmart us, so our best bet is to have them as closely aligned to our values as possible.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The problem is that the actions can be harmful in a very non-obvious, indirect way. It's not at all obvious which actions should be stopped.

For example when the system comes up with a very clever way to acquire resources - this action's safety depends on what it intends to use these resources for.

Such a supervision may buy us some safety, if we find a way to make the system's intentions very transparent.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Verified accounts are given to people who have clearly demonstrated understanding of AI Safety outside of this project, such as by being employed and vouched for by a major AI Safety organization or by producing high-impact research. Verified accounts may freely mark answers as canonical or not, regardless of how many Stamps the person has, to determine whether those answers are used by Stampy.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

This depends on how we will program it. It definitely can be autonomous, even now, we have some autonomous vehicles or flight control systems and many more.

Even though it's possible to build such systems, it may be better if they actively ask humans for supervision, for example in cases where they are uncertain what to do.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Nobody knows for sure when we will have ASI or if it is even possible. Predictions on AI timelines are notoriously variable, but recent surveys about the arrival of human-level AGI have median dates between 2040 and 2050 although the median for (optimistic) AGI researchers and futurists is in the early 2030s (source). What will happen if/when we are able to build human-level AGI is a point of major contention among experts. One survey asked (mostly) experts to estimate the likelihood that it would take less than 2 or 30 years for a human-level AI to improve to greatly surpass all humans in most professions. Median answers were 10% for "within 2 years" and 75% for "within 30 years". We know little about the limits of intelligence and whether increasing it will follow the law of accelerating or diminishing returns. Of particular interest to the control problem is the fast or hard takeoff scenario. It has been argued that the increase from a relatively harmless level of intelligence to a dangerous vastly superhuman level might be possible in a matter of seconds, minutes or hours: too fast for human controllers to stop it before they know what's happening. Moving from human to superhuman level might be as simple as adding computational resources, and depending on the implementation the AI might be able to quickly absorb large amounts of internet knowledge. Once we have an AI that is better at AGI design than the team that made it, the system could improve itself or create the next generation of even more intelligent AIs (which could then self-improve further or create an even more intelligent generation, and so on). If each generation can improve upon itself by a fixed or increasing percentage per time unit, we would see an exponential increase in intelligence: an intelligence explosion.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


It is impossible to design an AI without a goal, because it would do nothing. Therefore, in the sense that designing the AI’s goal is a form of control, it is impossible not to control an AI. This goes for anything that you create. You have to control the design of something at least somewhat in order to create it.

There may be relevant moral questions about our future relationship with possibly sentient machine intelligent, but the priority of the Control Problem finding a way to ensure the survival and well-being of the human species.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Goal-directed behavior arises naturally when systems are trained to on an objective. AI not trained or programmed to do well by some objective function would not be good at anything, and would be useless.

See Eliezer's and Gwern's posts about tool AI.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: tool ai (edit tags)

Cybersecurity is important because computing systems comprise the backbone of the modern economy. If the security of the internet was compromised, then the economy would suffer a tremendous blow.

Similarly, AI Safety might become important as AI systems begin forming larger and larger parts of the modern economy. As more and more labor gets automated, it becomes more and more important to ensure that that labor is occurring in a safe and robust way.

Before the widespread adoption of computing systems, lack of Cybersecurity didn’t cause much damage. However, it might have been beneficial to start thinking about Cybersecurity problems before the solutions were necessary.

Similarly, since AI systems haven’t been adopted en mass yet, lack of AI Safety isn’t causing harm. However, given that AI systems will become increasingly powerful and increasingly widespread, it might be prudent to try to solve safety problems before a catastrophe occurs.

Additionally, people sometimes think about Artificial General Intelligence (AGI), sometimes called Human-Level Artificial Intelligence (HLAI). One of the core problems in AI Safety is ensuring when AGI gets built, it has human interests at heart. (Note that most surveyed experts think building GI/HLAI is possible, but there is wide disagreement on how soon this might occur).

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

... further results

These 10 answers have been added in the last month.

The problem of autonomous weapons is not directly related to the AI Safety problem, but both fit into the "be careful what you do with AI" category.

In the short term, these would allow for worse totalitarianism as automated security forces will never rebel. This removes the moderating influence of human personnel as convincing machines to do a horrible thing is easier than convincing humans. Despots need security forces to remain in power. Human security forces betraying a despot is a common way that despots lose power, this would not happen with robots.

Another consideration is that computer security is hard! Autonomous weapons could be hacked, initially by humans but eventually by an AGI. This is not good for humanity's chances of surviving the transition to AGI, although access to autonomous weapons is probably not necessary for this transition to go poorly.

See also Stop Killer Robots.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Metaphilosophy is the philosophy of philosophy.

Bostrom has described the AI Safety problem as "philosophy with a deadline". Metaphilosophy could be used to steer an AGI towards e.g. our coherent extrapolated volition.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, metaphilosophy (create tag) (edit tags)

There is currently no clear win condition that most/all researchers agree on. Many researchers have their own paradigm and view the problem from a different angle.

Here are some of the sub-fields on AI Safety research. We need to solve the challenges in many of these fields to win.

See also Concrete Problems in AI Safety.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


In order of smallest commitment to largest:

  1. Link your friends to Stampy or Rob's videos
  2. Join or start a local AI Safety group at a university
  3. Get good at giving an elevator pitch
  4. Become a competent advocate by being convincing and have comprehensive knowledge, to answer follow-up questions
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, community (create tag) (edit tags)

Ajeya Cotra's attempted to calculate this number in her paper Bio Anchors.

[...]the total amount of computation done over the course of evolution from the first animals with neurons to humans was (~1e16 seconds) * (~1e25 FLOP/s) = ~1e41 FLOP

Nuño Sempere argues that this calculation of the computation done by neurons is insufficient as the environment would also need to be simulated, leading to a possibly much larger number.

Cotra posits that this amount of computation should be taken on an upper bound to the amount of computation needed to develop AGI. The actual amount of computation needed is probably many orders of magnitude lower.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: evolution (create tag) (edit tags)

... further results

These 201 canonical answers have one or fewer stamps.

One possible way to ensure the safety of a powerful AI system is to keep it contained in a software environment. There is nothing intrinsically wrong with this procedure - keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.

Humans sometimes put dangerous humans inside boxes to limit their ability to influence the external world. Sometimes, these humans escape their boxes. The security of a prison depends on certain assumptions, which can be violated. Yoshie Shiratori reportedly escaped prison by weakening the door-frame with miso soup and dislocating his shoulders.

Human written software has a high defect rate; we should expect a perfectly secure system to be difficult to create. If humans construct a software system they think is secure, it is possible that the security relies on a false assumption. A powerful AI system could potentially learn how its hardware works and manipulate bits to send radio signals. It could fake a malfunction and attempt social engineering when the engineers look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.

Experimentally, humans have convinced other humans to let them out of the box. Spooky.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: boxing (edit tags)

In principle it could (if you believe in functionalism), but it probably won't. One way to ensure that AI has human-like emotions would be to copy the way human brain works, but that's not what most AI researchers are trying to do.

It's similar to how once some people thought we will build mechanical horses to pull our vehicles, but it turned out it's much easier to build a car. AI probably doesn't need emotions or maybe even consciousness to be powerful, and the first AGIs that will get built will be the ones that are easiest to build.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Alternate phrasings are used to improve the semantic search which Stampy uses to serve people questions, by giving alternate ways to say a question which might trigger a match when the main wording won't. They should generally only be used when there is a significantly different wording, rather than for only very minor changes.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

A slow takeoff is where AI capabilities improve gradually, giving us plenty of time to adapt. In a moderate takeoff we might see accelerating progress, but we still won’t be caught off guard by a dramatic change. Whereas, in a fast or hard takeoff AI would go from being not very generally competent to sufficiently superhuman to control the future too fast for humans to course correct if something goes wrong.

The article Distinguishing definitions of takeoff goes into more detail on this.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Intelligence is powerful. One might say that “Intelligence is no match for a gun, or for someone with lots of money,” but both guns and money were produced by intelligence. If not for our intelligence, humans would still be foraging the savannah for food.

Intelligence is what caused humans to dominate the planet in the blink of an eye (on evolutionary timescales). Intelligence is what allows us to eradicate diseases, and what gives us the potential to eradicate ourselves with nuclear war. Intelligence gives us superior strategic skills, superior social skills, superior economic productivity, and the power of invention.

A machine with superintelligence would be able to hack into vulnerable networks via the internet, commandeer those resources for additional computing power, take over mobile machines connected to networks connected to the internet, use them to build additional machines, perform scientific experiments to understand the world better than humans can, invent quantum computing and nanotechnology, manipulate the social world better than we can, and do whatever it can to give itself more power to achieve its goals — all at a speed much faster than humans can respond to.

See also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!