Review answers

From Stampy's Wiki

If you think an answer is good (i.e. accurate, helpful, and replies well to the question) then please give it a stamp of approval. If you're a more experienced editor and have a good feel for when there is consensus around an answer being accepted, feel free to mark answers as canonical, so that Stampy will start serving them to users.

Individual pages

These pages track 208 answers which need review (with some being counted multiple times).

Review answers

These 46 non-canonical answers are answering canonical questions.

Perhaps. There is a chance that directly lobbying politicians could help, but there's also a chance that actions end up being net-negative. It would be great if we could slow down AI, but doing so might simple mean that a nation less concerned about safety produces AI first. We could ask them to pass regulations or standards related to AGI, but passing ineffective regulation might interfere with passing more effective regulation later down the track as people may consider the issue dealt with. Or the requirements of complying with bureaucracy might prove to be a distraction from safe AI.

If you are concerned about this issue, you should probably try learning as much about this issue as possible and also spend a lot of time brainstorming downside risks and seeing what risks other people have identified.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: persuasion, politics (create tag) (edit tags)

Working out milestone tasks that we expect to be achieved before we reach AGI can be difficult. Some tasks, like "continuous learning" intuitively seem like they will need to be solved before someone builds AGI. Continuous learning is learning bit by bit, as you get more data. Current ML systems usually don't do this, instead learning everything at once from a big dataset. Because humans can do continuous learning, it seems like it might be required for AGI. However, you have to be careful with reasoning like this, because it is possible the first generally capable artificial intelligence will work quite differently to a human. It's possible the first AGI will be designed to avoid needing "continuous learning", maybe by being designed to do a big retraining process every day. This might still allow it to be as capable as humans at almost every task, but without solving the "continuous learning" problem.

Because of arguments like the above, it's not always clear whether a given task is "required" for AGI.

Some potential big milestone tasks might be:

  • ARC challenge (tests the ability to generate the "simplest explanation" for patterns)
  • Human level sample efficiency at various tasks (EfficientZero already does Atari games)


This metaculus question has four very specific milestones that it considers to be requirements for "weak AGI".

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

There are multiple programmes you can apply to if you want to try becoming a researcher. If accepted to these programs, you will get funding and mentorship. Some examples of these programs are: SERI summer research fellowship, CERI summer research fellowship, SERI ML Alignment Theory Program, and more. A lot of these programs run during specific times of the year (specifically during the summer).

Other examples of things you can do are: join the next iteration of the AGI Safety Fundamentals programme (https://www.eacambridge.org/technical-alignment-curriculum), if you're thinking of a career as a researcher working on AI safety questions you can get 1-1 career advice from 80,000 Hours (https://80000hours.org/speak-with-us), you can apply to attend an EAGx or EAG conference (https://www.eaglobal.org/events/) where you can meet in-person with researchers working on these questions so you can directly ask them for advice.

Some of these resources might be helpful: https://www.aisafetysupport.org/resources/lots-of-links

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, mentorship (create tag) (edit tags)

It's not completely clear exactly what 'merging' with AI would imply, but it doesn't seem like a way to get around the alignment problem. If the AI system is aligned, and wants to do what humans want, then having direct access to human brains could provide a lot of information about human values and goals very quickly and efficiently, and thus be helpful for better alignment. Although, a smart AI system could also get almost all of this information without a brain-computer interface, through conversation, observation etc, though much slower. On the other hand if the system is not aligned, and doesn't fundamentally want humans to get what we want, then extra information about how human minds work doesn't help and only makes the problem worse. Allowing a misaligned AGI direct access to your brain hardware is a bad idea for obvious reasons.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Preventing an AI from escaping by using a more powerful AI, gets points for creative thinking, but unfortunately we would need to have already aligned the first AI. Even if the second AI's only terminal goal were to prevent the first ai from escaping, it would also have an instrumental goal of converting the rest of the universe into computer chips so that it would have more processing power to figure out how to best contain the first AGI.

It might be possible to try to bind a stronger AI with a weaker AI, but this is unlikely to work as the stronger AI would have an advantage due to being stronger. Further, there is a chance that the two AI's end up working out a deal where the first AI decides to stay in the box and the second AI does whatever the first AI would have down if it were able to escape.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


One of the main questions about simulation theory is why would a society invest a large quantity of resources to create it. One possible answer is an environment to train/test AI, or run it safely isolated from an outside reality.

It's a fun question but probably not one worth thinking about too much. This kind of question is impossible to get information from observations and experiments.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: simulation hypothesis (create tag) (edit tags)

I think an AI inner aligned to optimize a utility function of maximize happiness minus suffering is likely to do something like this.

Inner aligned meaning the AI is trying to do the thing we trained it to do. Whether this is what we actually want or not.

"Aligned to what" is the outer alignment problem which is where the failure in this example is. There is a lot of debate on what utility functions are safe or desirable to maximize, and if human values can even be described by a utility function.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Autonomous weapons, especially a nuclear arsenal, being used by an AI is a concern, but this seems downstream of the central problem of giving an unaligned AI any capabilities to impact the world.

Triggering nuclear war is only one of many ways a power seeking AI might choose to take control. This seems unlikely, as resources the AI would want to control (or the AI itself) would likely be destroyed in the process.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


This depends on what the superintelligence in question wants to happen. If AIs want humans to continue being employable, they’ll act to ensure humans remain employable by setting up roles that only biological humans can fill, artificially perpetuating the need for employing humans.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: automation, technological unemployment (create tag) (edit tags)

Alignment is very broadly concerned with how to align an AI with any given set of arbitrary values. For the purposes of research, it doesn't matter what the values are—so long as we can get the AI to sincerely hold them we have succeeded at alignment. Once the problem of alignment is solved then the "human values" a given AI holds are those given to it by its creators (as there is no reason for anyone to create an AI that works against their interest)
Alignment is very broadly concerned with how to align an AI with any given set of arbitrary values. For the purposes of research, it doesn't matter what the values are—so long as we can get the AI to sincerely hold them we have succeeded at alignment. Once the problem of alignment is solved then the "human values" a given AI holds are those given to it by its creators (as there is no reason for anyone to create an AI that works against their interest).

Optimistic views might hold that it is possible to coordinate between all AI creators to align their AIs only with a central agreed-upon definition of "human values," which could be determined by traditional human political organizations. Succeeding at this coordination would prevent (or at least, reduce) the weaponization of AIs toward competition between these values.

More pessimistic views hold that this coordination is unlikely to succeed, and that just as today different definitions of "human values" compete with one another (through e.g. political conflicts), AIs will likely be constructed by actors with different values and will compete with one another on the same grounds. The exception being that this competition might end if one group gains enough advantage to carry out a Pivotal Act that can "lock-in" their set of values as winner.

We could imagine a good instance of this might look like a U.N.-sanctioned project constructing the first super-intelligent AI, successfully aligned with the human values roughly defined as "global peace and development". This AI might then perform countermeasures to reduce the influence of bad AIs by e.g. regulating further AI development, or seizing compute power from agencies developing bad AIs.

Bad outcomes might look similar to the above, but with AIs developed by extremists or terrorists taking over. Worse still would be a careless development group accidentally producing a maligned AI, where we don't end up with "bad human values" (like one of the more oppressive human moralities), we just end up with "non-human values" (like where only paperclips matter).

A common concern is that if a friendly AI doesn't carry this out, then an opposition AI is likely to do so. Hence, there is a relatively common view that safe AI not only must be developed, but must be deployed to prevent possibly hostile AIs from arising.

There are also arguments against "Pivot Act" mentality which promote political regulation as a better path toward friendly AI than leaving the responsibility to the first firm to finish.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


LessWrong (where we pull descriptions from) is missing a description for this tag, please add one.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, research agendas, agent foundations (create tag) (edit tags)

Questions are: (i) contributed by online users or Stampy editors via the Stampy Wiki; or (ii) scraped from online content (various AI-alignment-related FAQs as well as the comments sections of certain AI Alignment YouTube videos).

The scraped content is currently a secondary concern, but this crude process of aggregation will eventually be streamlined into a reliable source of human-editable questions and answers.

Questions are reviewed by Stampy editors, who decide if: (i) they're duplicates of existing questions (the criterion being that the answer to the existing question would be fully satisfactory to the asker of the new question); (ii) they're sufficiently within the scope of the Stampy project.

We are working on using semantic search to suggest possible duplicates.

If they meet these two criterion, questions are added to a list of canonical questions.

A rating system allows editors to assign quality levels "Meh"/"Unreviewed"/"Approved"/"Good"/"Excellent" in order the questions on Answer questions, so that the most important questions can be worked on first.

Answers to canonical questions can be contributed via the Stampy Wiki by online users or by Stampy editors directly, at which point the question is added to a list of "answered canonical questions"

Editors can attempt to improve a contributed answer, and/or can "stamp" it to indicate their approval, adding to its "stamp score".


Once the answer to a canonical question gets a sufficiently high stamp score it gets added to a list of canonical answers (to canonical questions).

These canonical question/answer pairs are then ready to be served to the user interface. In order for them to become visible there, though, they must be associated with existing canonical question/answer pairs in one of two ways: RELATED or FOLLOWUP. Any editor can improve these relationships, either based on tags or their own understanding of what a reader might want to know. Questions should aim to have 2-5 related + followups generally, although exceptions can be made.

If Questions B is RELATED to Question A, it will slide in below Question A on the UI page when Question A is clicked on, provided that if it is not already present on the page.

If Question B is FOLLOWUP to Question A, it will always slide in below Question A when Question A is clicked on, even if it is already present on the UI page.

A and B being RELATED questions can be thought of as a kind of conceptual adjacency. If a user is interested to know the answer to A, they'll probably be interested in the answer to B too, and vice versa. Reading these in either order should make roughly the same amount of sense to the average user.

Question B being FOLLOWUP to Question A can be thought of in terms of progressive knowledge: the answer to B will only really make sense to the average user if they have read the answer to A first. This is also used for letting Stampy ask clarifying questions to direct readers to the right part of his knowledge graph.


If you click on "Edit answer", then "[Show Advanced Options]", you'll be given the option to submit a brief version of you answer (this field will be automatically filled if the full answer exceeds 2000 characters).

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

There are debates about how discontinuous an intelligence explosion would be, with Paul Christiano expecting to see the world being transformed by less and less weak AGIs over some number of years, while Eliezer Yudkowsky expects a rapid jump in capabilities once generality is achieved and the self-improvement process is able to sustain itself.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Vael Gates's project links to lots of example transcripts of persuading senior AI capabilities researchers.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Codex / Github Copilot are AIs that use GPT-3 to write and edit code. When given some input code and comments describing the intended function, they will write output that extends the prompt as accurately as possible.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


"The real concern" isn't a particularly meaningful concept here. Deep learning has proven to be a very powerful technology, with far reaching implications across a number of aspects of human existence. There are significant benefits to be found if we manage the technology properly, but that management means addressing a broad range of concerns, one of which is the alignment problem.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


... further results

These 22 answers have been added in the last month.

Perhaps. There is a chance that directly lobbying politicians could help, but there's also a chance that actions end up being net-negative. It would be great if we could slow down AI, but doing so might simple mean that a nation less concerned about safety produces AI first. We could ask them to pass regulations or standards related to AGI, but passing ineffective regulation might interfere with passing more effective regulation later down the track as people may consider the issue dealt with. Or the requirements of complying with bureaucracy might prove to be a distraction from safe AI.

If you are concerned about this issue, you should probably try learning as much about this issue as possible and also spend a lot of time brainstorming downside risks and seeing what risks other people have identified.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: persuasion, politics (create tag) (edit tags)

Working out milestone tasks that we expect to be achieved before we reach AGI can be difficult. Some tasks, like "continuous learning" intuitively seem like they will need to be solved before someone builds AGI. Continuous learning is learning bit by bit, as you get more data. Current ML systems usually don't do this, instead learning everything at once from a big dataset. Because humans can do continuous learning, it seems like it might be required for AGI. However, you have to be careful with reasoning like this, because it is possible the first generally capable artificial intelligence will work quite differently to a human. It's possible the first AGI will be designed to avoid needing "continuous learning", maybe by being designed to do a big retraining process every day. This might still allow it to be as capable as humans at almost every task, but without solving the "continuous learning" problem.

Because of arguments like the above, it's not always clear whether a given task is "required" for AGI.

Some potential big milestone tasks might be:

  • ARC challenge (tests the ability to generate the "simplest explanation" for patterns)
  • Human level sample efficiency at various tasks (EfficientZero already does Atari games)


This metaculus question has four very specific milestones that it considers to be requirements for "weak AGI".

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

There are multiple programmes you can apply to if you want to try becoming a researcher. If accepted to these programs, you will get funding and mentorship. Some examples of these programs are: SERI summer research fellowship, CERI summer research fellowship, SERI ML Alignment Theory Program, and more. A lot of these programs run during specific times of the year (specifically during the summer).

Other examples of things you can do are: join the next iteration of the AGI Safety Fundamentals programme (https://www.eacambridge.org/technical-alignment-curriculum), if you're thinking of a career as a researcher working on AI safety questions you can get 1-1 career advice from 80,000 Hours (https://80000hours.org/speak-with-us), you can apply to attend an EAGx or EAG conference (https://www.eaglobal.org/events/) where you can meet in-person with researchers working on these questions so you can directly ask them for advice.

Some of these resources might be helpful: https://www.aisafetysupport.org/resources/lots-of-links

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, mentorship (create tag) (edit tags)

The majority of the material, as of 2022, was written by [Dan Hendrycks](https://scholar.google.com/citations?user=czyretsAAAAJ&hl=en), the research director of the [Center for AI Safety](https://www.cais.ai/).
There is the [Intro to ML Safety](https://course.mlsafety.org/) web course by the [Center for AI Safety](https://www.cais.ai/). It begins with a review of deep learning, and does an introduction to the research fields of robustness, interpretability, alignment, and systemic safety.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: language models, academia (create tag), neural networks (create tag), robustness (create tag) (edit tags)

It's not completely clear exactly what 'merging' with AI would imply, but it doesn't seem like a way to get around the alignment problem. If the AI system is aligned, and wants to do what humans want, then having direct access to human brains could provide a lot of information about human values and goals very quickly and efficiently, and thus be helpful for better alignment. Although, a smart AI system could also get almost all of this information without a brain-computer interface, through conversation, observation etc, though much slower. On the other hand if the system is not aligned, and doesn't fundamentally want humans to get what we want, then extra information about how human minds work doesn't help and only makes the problem worse. Allowing a misaligned AGI direct access to your brain hardware is a bad idea for obvious reasons.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


... further results

These 171 canonical answers have one or fewer stamps.

One possible way to ensure the safety of a powerful AI system is to keep it contained in a software environment. There is nothing intrinsically wrong with this procedure - keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.

Humans sometimes put dangerous humans inside boxes to limit their ability to influence the external world. Sometimes, these humans escape their boxes. The security of a prison depends on certain assumptions, which can be violated. Yoshie Shiratori reportedly escaped prison by weakening the door-frame with miso soup and dislocating his shoulders.

Human written software has a high defect rate; we should expect a perfectly secure system to be difficult to create. If humans construct a software system they think is secure, it is possible that the security relies on a false assumption. A powerful AI system could potentially learn how its hardware works and manipulate bits to send radio signals. It could fake a malfunction and attempt social engineering when the engineers look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.

Experimentally, humans have convinced other humans to let them out of the box. Spooky.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: boxing (edit tags)

In principle it could (if you believe in functionalism), but it probably won't. One way to ensure that AI has human-like emotions would be to copy the way human brain works, but that's not what most AI researchers are trying to do.

It's similar to how once some people thought we will build mechanical horses to pull our vehicles, but it turned out it's much easier to build a car. AI probably doesn't need emotions or maybe even consciousness to be powerful, and the first AGIs that will get built will be the ones that are easiest to build.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Alternate phrasings are used to improve the semantic search which Stampy uses to serve people questions, by giving alternate ways to say a question which might trigger a match when the main wording won't. They should generally only be used when there is a significantly different wording, rather than for only very minor changes.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

A slow takeoff is where AI capabilities improve gradually, giving us plenty of time to adapt. In a moderate takeoff we might see accelerating progress, but we still won’t be caught off guard by a dramatic change. Whereas, in a fast or hard takeoff AI would go from being not very generally competent to sufficiently superhuman to control the future too fast for humans to course correct if something goes wrong.

The article Distinguishing definitions of takeoff goes into more detail on this.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Intelligence is powerful. One might say that “Intelligence is no match for a gun, or for someone with lots of money,” but both guns and money were produced by intelligence. If not for our intelligence, humans would still be foraging the savannah for food.

Intelligence is what caused humans to dominate the planet in the blink of an eye (on evolutionary timescales). Intelligence is what allows us to eradicate diseases, and what gives us the potential to eradicate ourselves with nuclear war. Intelligence gives us superior strategic skills, superior social skills, superior economic productivity, and the power of invention.

A machine with superintelligence would be able to hack into vulnerable networks via the internet, commandeer those resources for additional computing power, take over mobile machines connected to networks connected to the internet, use them to build additional machines, perform scientific experiments to understand the world better than humans can, invent quantum computing and nanotechnology, manipulate the social world better than we can, and do whatever it can to give itself more power to achieve its goals — all at a speed much faster than humans can respond to.

See also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!