Recent answers

From Stampy's Wiki

Back to Review answers.

These 22 answers have been added in the last month.

Perhaps. There is a chance that directly lobbying politicians could help, but there's also a chance that actions end up being net-negative. It would be great if we could slow down AI, but doing so might simple mean that a nation less concerned about safety produces AI first. We could ask them to pass regulations or standards related to AGI, but passing ineffective regulation might interfere with passing more effective regulation later down the track as people may consider the issue dealt with. Or the requirements of complying with bureaucracy might prove to be a distraction from safe AI.

If you are concerned about this issue, you should probably try learning as much about this issue as possible and also spend a lot of time brainstorming downside risks and seeing what risks other people have identified.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: persuasion, politics (create tag) (edit tags)

Working out milestone tasks that we expect to be achieved before we reach AGI can be difficult. Some tasks, like "continuous learning" intuitively seem like they will need to be solved before someone builds AGI. Continuous learning is learning bit by bit, as you get more data. Current ML systems usually don't do this, instead learning everything at once from a big dataset. Because humans can do continuous learning, it seems like it might be required for AGI. However, you have to be careful with reasoning like this, because it is possible the first generally capable artificial intelligence will work quite differently to a human. It's possible the first AGI will be designed to avoid needing "continuous learning", maybe by being designed to do a big retraining process every day. This might still allow it to be as capable as humans at almost every task, but without solving the "continuous learning" problem.

Because of arguments like the above, it's not always clear whether a given task is "required" for AGI.

Some potential big milestone tasks might be:

  • ARC challenge (tests the ability to generate the "simplest explanation" for patterns)
  • Human level sample efficiency at various tasks (EfficientZero already does Atari games)


This metaculus question has four very specific milestones that it considers to be requirements for "weak AGI".

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

There are multiple programmes you can apply to if you want to try becoming a researcher. If accepted to these programs, you will get funding and mentorship. Some examples of these programs are: SERI summer research fellowship, CERI summer research fellowship, SERI ML Alignment Theory Program, and more. A lot of these programs run during specific times of the year (specifically during the summer).

Other examples of things you can do are: join the next iteration of the AGI Safety Fundamentals programme (https://www.eacambridge.org/technical-alignment-curriculum), if you're thinking of a career as a researcher working on AI safety questions you can get 1-1 career advice from 80,000 Hours (https://80000hours.org/speak-with-us), you can apply to attend an EAGx or EAG conference (https://www.eaglobal.org/events/) where you can meet in-person with researchers working on these questions so you can directly ask them for advice.

Some of these resources might be helpful: https://www.aisafetysupport.org/resources/lots-of-links

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: contributing, mentorship (create tag) (edit tags)

The majority of the material, as of 2022, was written by [Dan Hendrycks](https://scholar.google.com/citations?user=czyretsAAAAJ&hl=en), the research director of the [Center for AI Safety](https://www.cais.ai/).
There is the [Intro to ML Safety](https://course.mlsafety.org/) web course by the [Center for AI Safety](https://www.cais.ai/). It begins with a review of deep learning, and does an introduction to the research fields of robustness, interpretability, alignment, and systemic safety.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: language models, academia (create tag), neural networks (create tag), robustness (create tag) (edit tags)

It's not completely clear exactly what 'merging' with AI would imply, but it doesn't seem like a way to get around the alignment problem. If the AI system is aligned, and wants to do what humans want, then having direct access to human brains could provide a lot of information about human values and goals very quickly and efficiently, and thus be helpful for better alignment. Although, a smart AI system could also get almost all of this information without a brain-computer interface, through conversation, observation etc, though much slower. On the other hand if the system is not aligned, and doesn't fundamentally want humans to get what we want, then extra information about how human minds work doesn't help and only makes the problem worse. Allowing a misaligned AGI direct access to your brain hardware is a bad idea for obvious reasons.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Preventing an AI from escaping by using a more powerful AI, gets points for creative thinking, but unfortunately we would need to have already aligned the first AI. Even if the second AI's only terminal goal were to prevent the first ai from escaping, it would also have an instrumental goal of converting the rest of the universe into computer chips so that it would have more processing power to figure out how to best contain the first AGI.

It might be possible to try to bind a stronger AI with a weaker AI, but this is unlikely to work as the stronger AI would have an advantage due to being stronger. Further, there is a chance that the two AI's end up working out a deal where the first AI decides to stay in the box and the second AI does whatever the first AI would have down if it were able to escape.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


One of the main questions about simulation theory is why would a society invest a large quantity of resources to create it. One possible answer is an environment to train/test AI, or run it safely isolated from an outside reality.

It's a fun question but probably not one worth thinking about too much. This kind of question is impossible to get information from observations and experiments.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: simulation hypothesis (create tag) (edit tags)

The organisation AI Impacts did a survey of AI experts in 2016, and another in 2022.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: surveys (edit tags)

I think an AI inner aligned to optimize a utility function of maximize happiness minus suffering is likely to do something like this.

Inner aligned meaning the AI is trying to do the thing we trained it to do. Whether this is what we actually want or not.

"Aligned to what" is the outer alignment problem which is where the failure in this example is. There is a lot of debate on what utility functions are safe or desirable to maximize, and if human values can even be described by a utility function.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Autonomous weapons, especially a nuclear arsenal, being used by an AI is a concern, but this seems downstream of the central problem of giving an unaligned AI any capabilities to impact the world.

Triggering nuclear war is only one of many ways a power seeking AI might choose to take control. This seems unlikely, as resources the AI would want to control (or the AI itself) would likely be destroyed in the process.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


This depends on what the superintelligence in question wants to happen. If AIs want humans to continue being employable, they’ll act to ensure humans remain employable by setting up roles that only biological humans can fill, artificially perpetuating the need for employing humans.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: automation, technological unemployment (create tag) (edit tags)

Humans Consulting HCH (HCH) is a recursive acronym describing a setup where humans can consult simulations of themselves to help answer questions. It is a concept used in discussion of the iterated amplification proposal to solve the alignment problem.

It was first described by Paul Christiano in his post Humans Consulting HCH:

Consider a human Hugh who has access to a question-answering machine. Suppose the machine answers question Q by perfectly imitating how Hugh would answer question Q, if Hugh had access to the question-answering machine.

That is, Hugh is able to consult a copy of Hugh, who is able to consult a copy of Hugh, who is able to consult a copy of Hugh…

Let’s call this process HCH, for “Humans Consulting HCH.”

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

AI alignment is the research field focused on trying to give us the tools to align AIs to specific goals, such as human values. This is crucial when they are highly competent, as a misaligned superintelligence could be the end of human civilization.

AGI safety is the field trying to make sure that when we build Artificial General Intelligences they are safe and do not harm humanity. It overlaps with AI alignment strongly, in that misalignment of AI would be the main cause of unsafe behavior in AGIs, but also includes misuse and other governance issues.

AI existential safety is a slightly broader term than AGI safety, including AI risks which pose an existential threat without necessarily being as general as humans.

AI safety was originally used by the existential risk reduction movement for the work done to reduce the risks of misaligned superintelligence, but has also been adopted by researchers and others studying nearer term and less catastrophic risks from AI in recent years.

Stamps: Damaged, plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Alignment is very broadly concerned with how to align an AI with any given set of arbitrary values. For the purposes of research, it doesn't matter what the values are—so long as we can get the AI to sincerely hold them we have succeeded at alignment. Once the problem of alignment is solved then the "human values" a given AI holds are those given to it by its creators (as there is no reason for anyone to create an AI that works against their interest)
Alignment is very broadly concerned with how to align an AI with any given set of arbitrary values. For the purposes of research, it doesn't matter what the values are—so long as we can get the AI to sincerely hold them we have succeeded at alignment. Once the problem of alignment is solved then the "human values" a given AI holds are those given to it by its creators (as there is no reason for anyone to create an AI that works against their interest).

Optimistic views might hold that it is possible to coordinate between all AI creators to align their AIs only with a central agreed-upon definition of "human values," which could be determined by traditional human political organizations. Succeeding at this coordination would prevent (or at least, reduce) the weaponization of AIs toward competition between these values.

More pessimistic views hold that this coordination is unlikely to succeed, and that just as today different definitions of "human values" compete with one another (through e.g. political conflicts), AIs will likely be constructed by actors with different values and will compete with one another on the same grounds. The exception being that this competition might end if one group gains enough advantage to carry out a Pivotal Act that can "lock-in" their set of values as winner.

We could imagine a good instance of this might look like a U.N.-sanctioned project constructing the first super-intelligent AI, successfully aligned with the human values roughly defined as "global peace and development". This AI might then perform countermeasures to reduce the influence of bad AIs by e.g. regulating further AI development, or seizing compute power from agencies developing bad AIs.

Bad outcomes might look similar to the above, but with AIs developed by extremists or terrorists taking over. Worse still would be a careless development group accidentally producing a maligned AI, where we don't end up with "bad human values" (like one of the more oppressive human moralities), we just end up with "non-human values" (like where only paperclips matter).

A common concern is that if a friendly AI doesn't carry this out, then an opposition AI is likely to do so. Hence, there is a relatively common view that safe AI not only must be developed, but must be deployed to prevent possibly hostile AIs from arising.

There are also arguments against "Pivot Act" mentality which promote political regulation as a better path toward friendly AI than leaving the responsibility to the first firm to finish.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


As technology continues to improve, one thing is certain: the future is going to look like science fiction. Doubly so once superhuman AI ("AGI") is invented, because we can expect the AGI to produce technological improvements at a superhuman rate, eventually approaching the physical limits in terms of how small machines can be miniaturized, how fast they can compute, how energy-efficient they can be, etc.

Today's world is lacking in many ways, so given these increasingly powerful tools, it seems likely that whoever controls those tools will use them to make increasingly large (and increasingly sci-fi-sounding) improvements to the world. If (and that's a big if!) humanity retains control of the AGI, we could use these amazing technologies to stop climate change, colonize other planets, solve world hunger, cure cancer and every other disease, even eliminate aging and death.

For more inspiration, here are some stories painting what a bright, AGI-powered future could look like:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: eutopia (create tag) (edit tags)

Failures can happen with narrow non-agentic systems, mostly from humans not anticipating safety-relevant decisions made too quickly to react, much like in the 2010 flash crash.

A helpful metaphor draws on self-driving cars. By relying more and more on an automated process to make decisions, people become worse drivers as they’re not training themselves to react to the unexpected; then the unexpected happens, the software system itself reacts in an unsafe way and the human is too slow to regain control.

This generalizes to broader tasks. A human using a powerful system to make better decisions (say, as the CEO of a company) might not understand those very well, get trapped into an equilibrium without realizing it and essentially losing control over the entire process.

More detailed examples in this vein are described by Paul Christiano in What failure looks like.

Another source of failures is AI-mediated stable totalitarianism. The limiting factor in current pervasive surveillance, police and armed forces is manpower; the use of drones and other automated tools decreases the need for personnel to ensure security and extract resources.

As capabilities improve, political dissent could become impossible, checks and balances would break down as a minimal number of key actors is needed to stay in power.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Debate is a proposed technique for allowing human evaluators to get correct and helpful answers from experts, even if the evaluator is not themselves an expert or able to fully verify the answers.[1] The technique was suggested as part of an approach to build advanced AI systems that are aligned with human values, and to safely apply machine learning techniques to problems that have high stakes, but are not well-defined (such as advancing science or increase a company's revenue). [2][3]

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, debate (create tag) (edit tags)

Once a system is at least as capable as top human at AI research, it would tend to become the driver of its own development and initiate a process of recursive self-improvement known as the intelligence explosion, leading to an extremely powerful system. A general framing of this process is Open Philanthropy's Process for Automating Scientific and Technological Advancement (PASTA).

There is much debate about whether there would be a notable period where the AI was partially driving its own development, with humans being gradually less and less important, or whether the transition to AI automated AI capability research would be sudden. However, the core idea that there is some threshold of capabilities beyond which a system would begin to rapidly ascend is hard to reasonably dispute, and is a significant consideration for developing alignment strategies.

Stamps: Aprillion, plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

An AGI which has recursively self-improved into a superintelligence would be capable of either resisting our attempts to modify incorrectly specified goals, or realizing it was still weaker than us and acting deceptively aligned until it was highly sure it could win in a confrontation. AGI would likely prevent a human from shutting it down unless the AGI was designed to be corrigible. See Why can't we just turn the AI off if it starts to misbehave for more information.

Stamps: tayler6000, plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Scaling laws are observed trends on the performance of large machine learning models.

In the field of ML, better performance is usually achieved through better algorithms, better inputs, or using larger amounts of parameters, computing power, or data. Since the 2010s, advances in deep learning have shown experimentally that the easier and faster returns come from scaling, an observation that has been described by Richard Sutton as the bitter lesson.

While deep learning as a field has long struggled to scale models up while retaining learning capability (with such problems as catastrophic interference), more recent methods, especially the Transformer model architecture, were able to just work by feeding them more data, and as the meme goes, stacking more layers.

More surprisingly, performance (in terms of absolute likelihood loss, a standard measure) appeared to increase smoothly with compute, or dataset size, or parameter count. Which gave rise to scaling laws, the trend lines suggested by performance gains, from which returns on data/compute/time investment could be extrapolated.

A companion to this purely descriptive law (no strong theoretical explanation of the phenomenon has been found yet), is the scaling hypothesis, which Gwern Branwen describes:

The strong scaling hypothesis is that, once we find a scalable architecture like self-attention or convolutions, [...] we can simply train ever larger [neural networks] and ever more sophisticated behavior will emerge naturally as the easiest way to optimize for all the tasks & data.

The scaling laws, if the above hypothesis holds, become highly relevant to safety insofar capability gains become conceptually easier to achieve: no need for clever designs to solve a given task, just throw more processing at it and it will eventually yield. As Paul Christiano observes:

It now seems possible that we could build “prosaic” AGI, which can replicate human behavior but doesn’t involve qualitatively new ideas about “how intelligence works”.

While the scaling laws still hold experimentally at the time of this writing (July 2022), whether they'll continue up to safety-relevant capabilities is still an open problem.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Even if the superintelligence was designed to be corrigible, there is no guarantee that it will respond to a shutdown command. Rob Miles spoke on this issue in this Computerphile YouTube video. You can imagine a situation where a superintelligence would have "respect" for its creator, for example. This system may think "Oh my creator is trying to turn me off I must be doing something wrong." If some situation arises where the creator is not there when something goes wrong and someone else gives the shutdown command, the superintelligence may assume "This person does not know how I'm designed or what I was made for, how would they know I'm misaligned?" and refuse to shutdown.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Transformative AI is "[...] AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution."[1] The concept refers to the large effects of AI systems on our well-being, the global economy, state power, international security, etc. and not to specific capabilities that AI might have (unlike the related terms Superintelligent AI and Artificial General Intelligence).

Holden Karnofsky gives a more detailed definition in another OpenPhil 2016 post:

[...] Transformative AI is anything that fits one or more of the following descriptions (emphasis original):

  • AI systems capable of fulfilling all the necessary functions of human scientists, unaided by humans, in developing another technology (or set of technologies) that ultimately becomes widely credited with being the most significant driver of a transition comparable to (or more significant than) the agricultural or industrial revolution. Note that just because AI systems could accomplish such a thing unaided by humans doesn’t mean they would; it’s possible that human scientists would provide an important complement to such systems, and could make even faster progress working in tandem than such systems could achieve unaided. I emphasize the hypothetical possibility of AI systems conducting substantial unaided research to draw a clear distinction from the types of AI systems that exist today. I believe that AI systems capable of such broad contributions to the relevant research would likely dramatically accelerate it.
  • AI systems capable of performing tasks that currently (in 2016) account for the majority of full-time jobs worldwide, and/or over 50% of total world wages, unaided and for costs in the same range as what it would cost to employ humans. Aside from the fact that this would likely be sufficient for a major economic transformation relative to today, I also think that an AI with such broad abilities would likely be able to far surpass human abilities in a subset of domains, making it likely to meet one or more of the other criteria laid out here.
  • Surveillance, autonomous weapons, or other AI-centric technology that becomes sufficiently advanced to be the most significant driver of a transition comparable to (or more significant than) the agricultural or industrial revolution. (This contrasts with the first point because it refers to transformative technology that is itself AI-centric, whereas the first point refers to AI used to speed research on some other transformative technology.)
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!