Canonical answers with low stamps

From Stampy's Wiki

Back to Review answers.

These 171 canonical answers have one or fewer stamps.

One possible way to ensure the safety of a powerful AI system is to keep it contained in a software environment. There is nothing intrinsically wrong with this procedure - keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.

Humans sometimes put dangerous humans inside boxes to limit their ability to influence the external world. Sometimes, these humans escape their boxes. The security of a prison depends on certain assumptions, which can be violated. Yoshie Shiratori reportedly escaped prison by weakening the door-frame with miso soup and dislocating his shoulders.

Human written software has a high defect rate; we should expect a perfectly secure system to be difficult to create. If humans construct a software system they think is secure, it is possible that the security relies on a false assumption. A powerful AI system could potentially learn how its hardware works and manipulate bits to send radio signals. It could fake a malfunction and attempt social engineering when the engineers look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.

Experimentally, humans have convinced other humans to let them out of the box. Spooky.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: boxing (edit tags)

In principle it could (if you believe in functionalism), but it probably won't. One way to ensure that AI has human-like emotions would be to copy the way human brain works, but that's not what most AI researchers are trying to do.

It's similar to how once some people thought we will build mechanical horses to pull our vehicles, but it turned out it's much easier to build a car. AI probably doesn't need emotions or maybe even consciousness to be powerful, and the first AGIs that will get built will be the ones that are easiest to build.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Alternate phrasings are used to improve the semantic search which Stampy uses to serve people questions, by giving alternate ways to say a question which might trigger a match when the main wording won't. They should generally only be used when there is a significantly different wording, rather than for only very minor changes.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

A slow takeoff is where AI capabilities improve gradually, giving us plenty of time to adapt. In a moderate takeoff we might see accelerating progress, but we still won’t be caught off guard by a dramatic change. Whereas, in a fast or hard takeoff AI would go from being not very generally competent to sufficiently superhuman to control the future too fast for humans to course correct if something goes wrong.

The article Distinguishing definitions of takeoff goes into more detail on this.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Intelligence is powerful. One might say that “Intelligence is no match for a gun, or for someone with lots of money,” but both guns and money were produced by intelligence. If not for our intelligence, humans would still be foraging the savannah for food.

Intelligence is what caused humans to dominate the planet in the blink of an eye (on evolutionary timescales). Intelligence is what allows us to eradicate diseases, and what gives us the potential to eradicate ourselves with nuclear war. Intelligence gives us superior strategic skills, superior social skills, superior economic productivity, and the power of invention.

A machine with superintelligence would be able to hack into vulnerable networks via the internet, commandeer those resources for additional computing power, take over mobile machines connected to networks connected to the internet, use them to build additional machines, perform scientific experiments to understand the world better than humans can, invent quantum computing and nanotechnology, manipulate the social world better than we can, and do whatever it can to give itself more power to achieve its goals — all at a speed much faster than humans can respond to.

See also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


It might look like there are straightforward ways to eliminate the problems of unaligned superintelligence, but so far all of them turn out to have hidden difficulties. There are many open problems identified by the research community which a solution would need to reliably overcome to be successful.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The Rob Miles AI Discord is the hub of all things Stampy. If you want to be part of the project and don't have access yet, ask plex#1874 on Discord (or plex on wiki).

You can also talk to us on the public Discord! Try #suggestions or #general, depending on what you want to talk about.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Blindly following the trendlines while forecasting technological progress is certainly a risk (affectionately known in AI circles as “pulling a Kurzweill”), but sometimes taking an exponential trend seriously is the right response.

Consider economic doubling times. In 1 AD, the world GDP was about $20 billion; it took a thousand years, until 1000 AD, for that to double to $40 billion. But it only took five hundred more years, until 1500, or so, for the economy to double again. And then it only took another three hundred years or so, until 1800, for the economy to double a third time. Someone in 1800 might calculate the trend line and say this was ridiculous, that it implied the economy would be doubling every ten years or so in the beginning of the 21st century. But in fact, this is how long the economy takes to double these days. To a medieval, used to a thousand-year doubling time (which was based mostly on population growth!), an economy that doubled every ten years might seem inconceivable. To us, it seems normal.

Likewise, in 1965 Gordon Moore noted that semiconductor complexity seemed to double every eighteen months. During his own day, there were about five hundred transistors on a chip; he predicted that would soon double to a thousand, and a few years later to two thousand. Almost as soon as Moore’s Law become well-known, people started saying it was absurd to follow it off a cliff – such a law would imply a million transistors per chip in 1990, a hundred million in 2000, ten billion transistors on every chip by 2015! More transistors on a single chip than existed on all the computers in the world! Transistors the size of molecules! But of course all of these things happened; the ridiculous exponential trend proved more accurate than the naysayers.

None of this is to say that exponential trends are always right, just that they are sometimes right even when it seems they can’t possibly be. We can’t be sure that a computer using its own intelligence to discover new ways to increase its intelligence will enter a positive feedback loop and achieve superintelligence in seemingly impossibly short time scales. It’s just one more possibility, a worry to place alongside all the other worrying reasons to expect a moderate or hard takeoff.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


An existing question is a duplicate of a new one if it is reasonable to expect whoever asked the new question to be satisfied if they received an answer to the existing question instead.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

A brain-computer interface (BCI) is a direct communication pathway between the brain and a computer device. BCI research is heavily funded, and has already met dozens of successes. Three successes in human BCIs are a device that restores (partial) sight to the blind, cochlear implants that restore hearing to the deaf, and a device that allows use of an artificial hand by direct thought.

Such device restore impaired functions, but many researchers expect to also augment and improve normal human abilities with BCIs. Ed Boyden is researching these opportunities as the lead of the Synthetic Neurobiology Group at MIT. Such devices might hasten the arrival of an intelligence explosion, if only by improving human intelligence so that the hard problems of AI can be solved more rapidly.

See also:

Wikipedia, Brain-computer interface

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Imagine, for example, that you are tasked with reducing traffic congestion in San Francisco at all costs, i.e. you do not take into account any other constraints. How would you do it? You might start by just timing traffic lights better. But wouldn’t there be less traffic if all the bridges closed down from 5 to 10AM, preventing all those cars from entering the city? Such a measure obviously violates common sense, and subverts the purpose of improving traffic, which is to help people get around – but it is consistent with the goal of “reducing traffic congestion”.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. However, human intelligence continues to dominate machine intelligence in generality.

A powerful chess computer is “narrow”: it can’t play other games. In contrast, humans have problem-solving abilities that allow us to adapt to new contexts and excel in many domains other than what the ancestral environment prepared us for.

In the absence of a formal definition of “intelligence” (and therefore of “artificial intelligence”), we can heuristically cite humans’ perceptual, inferential, and deliberative faculties (as opposed to, e.g., our physical strength or agility) and say that intelligence is “those kinds of things.” On this conception, intelligence is a bundle of distinct faculties — albeit a very important bundle that includes our capacity for science.

Our cognitive abilities stem from high-level patterns in our brains, and these patterns can be instantiated in silicon as well as carbon. This tells us that general AI is possible, though it doesn’t tell us how difficult it is. If intelligence is sufficiently difficult to understand, then we may arrive at machine intelligence by scanning and emulating human brains or by some trial-and-error process (like evolution), rather than by hand-coding a software agent.

If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond qualitative improvements in cognition, Nick Bostrom notes more straightforward advantages we could realize in digital minds, e.g.:

  • editability — “It is easier to experiment with parameter variations in software than in neural wetware.”
  • speed — “The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.”
  • serial depth — On short timescales, machines can carry out much longer sequential processes.
  • storage capacity — Computers can plausibly have greater working and long-term memory.
  • size — Computers can be much larger than a human brain.
  • duplicability — Copying software onto new hardware can be much faster and higher-fidelity than biological reproduction.

Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


People tend to imagine AIs as being like nerdy humans – brilliant at technology but clueless about social skills. There is no reason to expect this – persuasion and manipulation is a different kind of skill from solving mathematical proofs, but it’s still a skill, and an intellect as far beyond us as we are beyond lions might be smart enough to replicate or exceed the “charming sociopaths” who can naturally win friends and followers despite a lack of normal human emotions.

A superintelligence might be able to analyze human psychology deeply enough to understand the hopes and fears of everyone it negotiates with. Single humans using psychopathic social manipulation have done plenty of harm – Hitler leveraged his skill at oratory and his understanding of people’s darkest prejudices to take over a continent. Why should we expect superintelligences to do worse than humans far less skilled than they?

More outlandishly, a superintelligence might just skip language entirely and figure out a weird pattern of buzzes and hums that causes conscious thought to seize up, and which knocks anyone who hears it into a weird hypnotizable state in which they’ll do anything the superintelligence asks. It sounds kind of silly to me, but then, nuclear weapons probably would have sounded kind of silly to lions sitting around speculating about what humans might be able to accomplish. When you’re dealing with something unbelievably more intelligent than you are, you should probably expect the unexpected.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


We don’t yet know which AI architectures are safe; learning more about this is one of the goals of FLI's grants program. AI researchers are generally very responsible people who want their work to better humanity. If there are certain AI designs that turn out to be unsafe, then AI researchers will want to know this so they can develop alternative AI systems.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Avoid directly responding to the question in the answer, repeat the relevant part of the question instead. For example, if the question is "Can we do X", answer "We might be able to do X, if we can do Y", not "Yes, if we can manage Y". This way, the answer will also work for the questions "Why can't we do X" and "What would happen if we tried to do X".

Linking to external sites is strongly encouraged, one of the most valuable things Stampy can do is help people find other parts of the alignment information ecosystem.

Consider enclosing newly introduced terms, likely to be unfamiliar to many readers, in speech marks. If unsure, Google the term (in speech marks!) and see if it shows up anywhere other than LessWrong, the Alignment Forum, etc. Be judicious, as it's easy to use too many, but used carefully they can psychologically cushion newbies from a lot of unfamiliar terminology - in this context they're saying something like "we get that we're hitting you with a lot of new vocab, and you might not know what this term means yet".

When selecting related questions, there shouldn't be more than four unless there's a really good reason for that (some questions are asking for it, like the "Why can't we just..." question). It's also recommended to include at least one more "enticing" question to draw users in (relating to the more sensational, sci-fi, philosophical/ethical side of things) alongside more bland/neutral questions.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


There is a broad range of possible goals that an AI might possess, but there are a few basic drives that would be useful to almost any of them. These are called instrumentally convergent goals:

  1. Self preservation. An agent is less likely to achieve its goal if it is not around to see to its completion.
  2. Goal-content integrity. An agent is less likely to achieve its goal if its goal has been changed to something else. For example, if you offer Gandhi a pill that makes him want to kill people, he will refuse to take it.
  3. Self-improvement. An agent is more likely to achieve its goal if it is more intelligent and better at problem-solving.
  4. Resource acquisition. The more resources at an agent’s disposal, the more power it has to make change towards its goal. Even a purely computational goal, such as computing digits of pi, can be easier to achieve with more hardware and energy.

Because of these drives, even a seemingly simple goal could create an Artificial Superintelligence (ASI) hell-bent on taking over the world’s material resources and preventing itself from being turned off. The classic example is an ASI that was programmed to maximize the output of paper clips at a paper clip factory. The ASI had no other goal specifications other than “maximize paper clips,” so it converts all of the matter in the solar system into paper clips, and then sends probes to other star systems to create more factories.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Each major organization has a different approach. The research agendas are detailed and complex (see also AI Watch). Getting more brains working on any of them (and more money to fund them) may pay off in a big way, but it’s very hard to be confident which (if any) of them will actually work.

The following is a massive oversimplification, each organization actually pursues many different avenues of research, read the 2020 AI Alignment Literature Review and Charity Comparison for much more detail. That being said:

  • The Machine Intelligence Research Institute focuses on foundational mathematical research to understand reliable reasoning, which they think is necessary to provide anything like an assurance that a seed AI built will do good things if activated.
  • The Center for Human-Compatible AI focuses on Cooperative Inverse Reinforcement Learning and Assistance Games, a new paradigm for AI where they try to optimize for doing the kinds of things humans want rather than for a pre-specified utility function
  • Paul Christano's Alignment Research Center focuses is on prosaic alignment, particularly on creating tools that empower humans to understand and guide systems much smarter than ourselves. His methodology is explained on his blog.
  • The Future of Humanity Institute does work on crucial considerations and other x-risks, as well as AI safety research and outreach.
  • Anthropic is a new organization exploring natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
  • OpenAI is in a state of flux after major changes to their safety team.
  • DeepMind’s safety team is working on various approaches designed to work with modern machine learning, and does some communication via the Alignment Newsletter.
  • EleutherAI is a Machine Learning collective aiming to build large open source language models to allow more alignment research to take place.
  • Ought is a research lab that develops mechanisms for delegating open-ended thinking to advanced machine learning systems.

There are many other projects around AI Safety, such as the Windfall clause, Rob Miles’s YouTube channel, AI Safety Support, etc.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


If we pose a serious threat, it could hack our weapons systems and turn them against us. Future militaries are much more vulnerable to this due to rapidly progressing autonomous weapons. There’s also the option of creating bioweapons and distributing them to the most unstable groups you can find, tricking nations into WW3, or dozens of other things an agent many times smarter than any human with the ability to develop arbitrary technology, hack things (including communications), and manipulate people, or many other possibilities that something smarter than a human could think up. More can be found here.

If we are not a threat, in the course of pursuing its goals it may consume vital resources that humans need (e.g. using land for solar panels instead of farm crops). This video goes into more detail:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Follow-up questions are responses to an answer which reader might have, either because they want more information or are providing information to Stampy about what they're looking for. We don't expect to have great coverage of the former for a long time because there will be so many, but hopefully we'll be able to handle some of the most common ones.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

First, even “narrow” AI systems, which approach or surpass human intelligence in a small set of capabilities (such as image or voice recognition) already raise important questions regarding their impact on society. Making autonomous vehicles safe, analyzing the strategic and ethical dimensions of autonomous weapons, and the effect of AI on the global employment and economic systems are three examples. Second, the longer-term implications of human or super-human artificial intelligence are dramatic, and there is no consensus on how quickly such capabilities will be developed. Many experts believe there is a chance it could happen rather soon, making it imperative to begin investigating long-term safety issues now, if only to get a better sense of how much early progress is actually possible.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: timelines, agi (edit tags)

For weaker AI, yes, this would generally be a good option. If it’s not a full AGI, and in particular has not undergone an intelligence explosion, it would likely not resist being turned off, so we could prevent many failure modes by having off switches or tripwires.

However, once an AI is more advanced, it is likely to take actions to prevent it being shut down. See Why can't we just turn the AI off if it starts to misbehave? for more details.

It is possible that we could build tripwires in a way which would work even against advanced systems, but trusting that a superintelligence won’t notice and find a way around your tripwire is not a safe thing to do.
One thing that might make your AI system safer is to include an off switch. If it ever does anything we don’t like, we can turn it off. This implicitly assumes that we’ll be able to turn it off before things get bad, which might be false in a world where the AI thinks much faster than humans. Even assuming that we’ll notice in time, off switches turn out to not have the properties you would want them to have.

Humans have a lot of off switches. Humans also have a strong preference to not be turned off; they defend their off switches when other people try to press them. One possible reason for this is because humans prefer not to die, but there are other reasons.

Suppose that there’s a parent that cares nothing for their own life and cares only for the life of their child. If you tried to turn that parent off, they would try and stop you. They wouldn’t try to stop you because they intrinsically wanted to be turned off, but rather because there are fewer people to protect their child if they were turned off. People that want a world to look a certain shape will not want to be turned off because then it will be less likely for the world to look that shape; a parent that wants their child to be protected will protect themselves to continue protecting their child.

For this reason, it turns out to be difficult to install an off switch on a powerful AI system in a way that doesn’t result in the AI preventing itself from being turned off.

Ideally, you would want a system that knows that it should stop doing whatever it’s doing when someone tries to turn it off. The technical term for this is ‘corrigibility’; roughly speaking, an AI system is corrigible if it doesn’t resist human attempts to help and correct it. People are working hard on trying to make this possible, but it’s currently not clear how we would do this even in simple cases.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The concept of “merging with machines,” as popularized by Ray Kurzweil, is the idea that we will be able to put computerized elements into our brains that enhance us to the point where we ourselves are the AI, instead of creating AI outside of ourselves.

While this is a possible outcome, there is little reason to suspect that it is the most probable. The amount of computing power in your smart-phone took up an entire room of servers 30 years ago. Computer technology starts big, and then gets refined. Therefore, if “merging with the machines” requires hardware that can fit inside our brain, it may lag behind the first generations of the technology being developed. This concept of merging also supposes that we can even figure out how to implant computer chips that interface with our brain in the first place, we can do it before the invention of advanced AI, society will accept it, and that computer implants can actually produce major intelligence gains in the human brain. Even if we could successfully enhance ourselves with brain implants before the invention of Artificial Superintelligence (ASI), there is no way to guarantee that this would protect us from negative outcomes, and an ASI with ill-defined goals could still pose a threat to us.

It's not that Ray Kurzweil's ideas are impossible, it's just that his predictions are too specific, confident, and reliant on strange assumptions.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Canonical questions are the questions which we've checked are in scope and not duplicates, so we want answers to them. They may be edited to represent a class of question more broadly, rather than keeping all their idosyncracies. Once they're answered canonically Stampy will serve them to readers.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

80k links to an article on high impact careers in formal verification in the few paragraphs they've written about formal verification.
80k links to an article on high impact careers in formal verification in the few paragraphs they've written about formal verification.

Some other notes

  • https://github.com/deepmind/cartesian-frames I emailed Scott about doing this in coq before this repo was published and he said "I wouldn't personally find such a software useful but sounds like a valuable exercise for the implementer" or something like this.
  • When I mentioned the possibility of rolling some of infrabayesianism in coq to diffractor he wasn't like "omg we really need someone to do that" he was just like "oh that sounds cool" -- I never got around to it, if I would I'd talk to vanessa and diffractor about weakening/particularizing stuff beforehand.
  • if you extrapolate a pattern from those two examples, you start to think that agent foundations is the principle area of interest with proof assistants! and again- does the proof assistant exercise advance the research or provide a nutritious exercise to the programmer?
  • A sketch of a more prosaic scenario in which proof assistants play a role is "someone proposes isInnerAligned : GradientDescent -> Prop and someone else implements a galaxybrained new type theory/tool in which gradient descent is a primitive (whatever that means)", when I mentioned this scenario to Buck he said "yeah if that happened I'd direct all the engineers at redwood to making that tool easier to use", when I mentioned that scenario to Evan about a year ago he said didn't seem to think it was remotely plausible. probably a nonstarter.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Answer questions collects all the questions we definitely want answers to, browse there and see if you know how to answer any of them.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Computers only do what you tell them. But any programmer knows that this is precisely the problem: computers do exactly what you tell them, with no common sense or attempts to interpret what the instructions really meant. If you tell a human to cure cancer, they will instinctively understand how this interacts with other desires and laws and moral rules; if a maximizing AI acquires a goal of trying to cure cancer, it will literally just want to cure cancer.

Define a closed-ended goal as one with a clear endpoint, and an open-ended goal as one to do something as much as possible. For example “find the first one hundred digits of pi” is a closed-ended goal; “find as many digits of pi as you can within one year” is an open-ended goal. According to many computer scientists, giving a superintelligence an open-ended goal without activating human instincts and counterbalancing considerations will usually lead to disaster.

To take a deliberately extreme example: suppose someone programs a superintelligence to calculate as many digits of pi as it can within one year. And suppose that, with its current computing power, it can calculate one trillion digits during that time. It can either accept one trillion digits, or spend a month trying to figure out how to get control of the TaihuLight supercomputer, which can calculate two hundred times faster. Even if it loses a little bit of time in the effort, and even if there’s a small chance of failure, the payoff – two hundred trillion digits of pi, compared to a mere one trillion – is enough to make the attempt. But on the same basis, it would be even better if the superintelligence could control every computer in the world and set it to the task. And it would be better still if the superintelligence controlled human civilization, so that it could direct humans to build more computers and speed up the process further.

Now we’re in a situation where a superintelligence wants to take over the world. Taking over the world allows it to calculate more digits of pi than any other option, so without an architecture based around understanding human instincts and counterbalancing considerations, even a goal like “calculate as many digits of pi as you can” would be potentially dangerous.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


If by “solve alignment” you mean build a sufficiently performance-competitive superintelligence which has the goal of Coherent Extrapolated Volition or something else which captures human values, then yes. It would be able to deploy technology near the limits of physics (e.g. atomically precise manufacturing) to solve most of the other problems which face us, and steer the future towards a highly positive path for perhaps many billions of years until the heat death of the universe (barring more esoteric x-risks like encounters with advanced hostile civilizations, false vacuum decay, or simulation shutdown).

However, if you only have alignment of a superintelligence to a single human you still have the risk of misuse, so this should be at most a short-term solution. For example, what if Google creates a superintelligent AI, and it listens to the CEO of Google, and it’s programmed to do everything exactly the way the CEO of Google would want? Even assuming that the CEO of Google has no hidden unconscious desires affecting the AI in unpredictable ways, this gives one person a lot of power.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stable win condition (create tag), success models (create tag) (edit tags)

We’re facing the challenge of “Philosophy With A Deadline”.

Many of the problems surrounding superintelligence are the sorts of problems philosophers have been dealing with for centuries. To what degree is meaning inherent in language, versus something that requires external context? How do we translate between the logic of formal systems and normal ambiguous human speech? Can morality be reduced to a set of ironclad rules, and if not, how do we know what it is at all?

Existing answers to these questions are enlightening but nontechnical. The theories of Aristotle, Kant, Mill, Wittgenstein, Quine, and others can help people gain insight into these questions, but are far from formal. Just as a good textbook can help an American learn Chinese, but cannot be encoded into machine language to make a Chinese-speaking computer, so the philosophies that help humans are only a starting point for the project of computers that understand us and share our values.

The field of AI alignment combines formal logic, mathematics, computer science, cognitive science, and philosophy in order to advance that project.

This is the philosophy; the other half of Bostrom’s formulation is the deadline. Traditional philosophy has been going on almost three thousand years; machine goal alignment has until the advent of superintelligence, a nebulous event which may be anywhere from a decades to centuries away.

If the alignment problem doesn’t get adequately addressed by then, we are likely to see poorly aligned superintelligences that are unintentionally hostile to the human race, with some of the catastrophic outcomes mentioned above. This is why so many scientists and entrepreneurs are urging quick action on getting machine goal alignment research up to an adequate level.

If it turns out that superintelligence is centuries away and such research is premature, little will have been lost. But if our projections were too optimistic, and superintelligence is imminent, then doing such research now rather than later becomes vital.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


A slow takeoff over decades or centuries might give us enough time to worry about superintelligence during some indefinite “later”, making current planning more like worrying about “overpopulation on Mars”. But a moderate or hard takeoff means there wouldn’t be enough time to deal with the problem as it occurs, suggesting a role for preemptive planning.

As an aside, let’s take the “overpopulation on Mars” comparison seriously. Suppose Mars has a carrying capacity of 10 billion people, and we decide it makes sense to worry about overpopulation on Mars only once it is 75% of the way to its limit. Start with 100 colonists who double every twenty years. By the second generation there are 200 colonists; by the third, 400. Mars reaches 75% of its carrying capacity after 458 years, and crashes into its population limit after 464 years. So there were 464 years in which the Martians could have solved the problem, but they insisted on waiting until there were only six years left. Good luck solving a planetwide population crisis in six years. The moral of the story is that exponential trends move faster than you think and you need to start worrying about them early.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The AGI Safety Fundamentals Course is a arguably the best way to get up to speed on alignment, you can sign up to go through it with many other people studying and mentorship or read their materials independently.

Other great ways to explore include:

You might also want to consider reading Rationality: A-Z which covers a lot of skills that are valuable to acquire for people trying to think about large and complex issues, with The Rationalist's Guide to the Galaxy available as a shorter and more accessible AI-focused option.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Dev team

Name

Vision talk

Github

Trello

Active?

Notes / bio

Aprillion

video

Aprillion

yes

yes

experienced dev (Python, JS, CSS, ...)

Augustus Caesar

yes

AugustusCeasar

yes

soon!

Has some Discord bot experience

Benjamin Herman

no

no (not needed)

no

no

Helping with wiki design/css stuff

ccstan99

no

ccstan99

yes

yes

UI/UX designer

chriscanal

yes

chriscanal

yes

yes

experienced python dev

Damaged

no (not needed)

no (not needed)

no (not needed)

yes

experienced Discord bot dev, but busy with other projects. Can answer questions.

plex

yes

plexish

yes

yes

MediaWiki, plans, and coordinating people guy

robertskmiles

yes

robertskmiles

yes

yes

you've probably heard of him

Roland

yes

levitation

yes

yes

working on Semantic Search

sct202

yes

no (add when wiki is on github)

yes

yes

PHP dev, helping with wiki extensions

Social Christancing

yes

chrisrimmer

yes

maybe

experienced linux sysadmin

sudonym

yes

jmccuen

yes

yes

systems architect, has set up a lot of things

Editors

(add yourselves)

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Stampy is a character invented by Robert Miles and developed by the Stampy dev team. He is a stamp collecting robot, a play on clippy from the the paperclip maximizer thought experiment.

Stampy is designed to teach people about the risks of unaligned artificial intelligence, and facilitate a community of co-learners who build his FAQ database.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


OK, it’s great that you want to help, here are some ideas for ways you could do so without making a huge commitment:

  • Learning more about AI alignment will provide you with good foundations for any path towards helping. You could start by absorbing content (e.g. books, videos, posts), and thinking about challenges or possible solutions.
  • Getting involved with the movement by joining a local Effective Altruism or LessWrong group, Rob Miles’s Discord, and/or the AI Safety Slack is a great way to find friends who are interested and will help you stay motivated.
  • Donating to organizations or individuals working on AI alignment, possibly via a donor lottery or the Long Term Future Fund, can be a great way to provide support.
  • Writing or improving answers on my wiki so that other people can learn about AI alignment more easily is a great way to dip your toe into contributing. You can always ask on the Discord for feedback on things you write.
  • Getting good at giving an AI alignment elevator pitch, and sharing it with people who may be valuable to have working on the problem can make a big difference. However you should avoid putting them off the topic by presenting it in a way which causes them to dismiss it as sci-fi (dos and don’ts in the elevator pitch follow-up question).
  • Writing thoughtful comments on AI posts on LessWrong.
  • Participating in the AGI Safety Fundamentals program – either the AI alignment or governance track – and then facilitating discussions for it in the following round. The program involves nine weeks of content, with about two hours of readings + exercises per week and 1.5 hours of discussion, followed by four weeks to work on an independent project. As a facilitator, you'll be helping others learn about AI safety in-depth, many of whom are considering a career in AI safety. In the early 2022 round, facilitators were offered a stipend, and this seems likely to be the case for future rounds as well! You can learn more about facilitating in this post from December 2021.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


AIs fall prey to goodhart's law, which states that even a very good a proxy for what you're trying to optimize stops being a good proxy once you try to optimize it. For example, ability to pass test scores might have initially been highly correlated with how much students had learned. But then teachers and institutions came under pressure (optimization) to improve test scores, and they used tactics which improve the test results without improving long term retention or deep understanding, like teaching to the test and cramming. An AI which is optimized for a goal will pursue it to the letter, rather than trying to figure out what we meant. Many examples of this "specification gaming" can be found in the literature.
Let’s say that you’re the French government a while back. You notice that one of your colonies has too many rats, which is causing economic damage. You have basic knowledge of economics and incentives, so you decide to incentivize the local population to kill rats by offering to buy rat tails at one dollar apiece.

Initially, this works out and your rat problem goes down. But then, an enterprising colony member has the brilliant idea of making a rat farm. This person sells you hundreds of rat tails, costing you hundreds of dollars, but they’re not contributing to solving the rat problem.

Soon other people start making their own rat farms and you’re wasting thousands of dollars buying useless rat tails. You call off the project and stop paying for rat tails. This causes all the people with rat farms to shutdown their farms and release a bunch of rats. Now your colony has an even bigger rat problem.

Here’s another, more made-up example of the same thing happening. Let’s say you’re a basketball talent scout and you notice that height is correlated with basketball performance. You decide to find the tallest person in the world to recruit as a basketball player. Except the reason that they’re that tall is because they suffer from a degenerative bone disorder and can barely walk.

Another example: you’re the education system and you want to find out how smart students are so you can put them in different colleges and pay them different amounts of money when they get jobs. You make a test called the Standardized Admissions Test (SAT) and you administer it to all the students. In the beginning, this works. However, the students soon begin to learn that this test controls part of their future and other people learn that these students want to do better on the test. The gears of the economy ratchet forwards and the students start paying people to help them prepare for the test. Your test doesn’t stop working, but instead of measuring how smart the students are, it instead starts measuring a combination of how smart they are and how many resources they have to prepare for the test.

The formal name for the thing that’s happening is Goodhart’s Law. Goodhart’s Law roughly says that if there’s something in the world that you want, like “skill at basketball” or “absence of rats” or “intelligent students”, and you create a measure that tries to measure this like “height” or “rat tails” or “SAT scores”, then as long as the measure isn’t exactly the thing that you want, the best value of the measure isn’t the thing you want: the tallest person isn’t the best basketball player, the most rat tails isn’t the smallest rat problem, and the best SAT scores aren’t always the smartest students.

If you start looking, you can see this happening everywhere. Programmers being paid for lines of code write bloated code. If CFOs are paid for budget cuts, they slash purchases with positive returns. If teachers are evaluated by the grades they give, they hand out As indiscriminately.

In machine learning, this is called specification gaming, and it happens frequently.

Now that we know what Goodhart’s Law is, I’m going to talk about one of my friends, who I’m going to call Alice. Alice thinks it’s funny to answer questions in a way that’s technically correct but misleading. Sometimes I’ll ask her, “Hey Alice, do you want pizza or pasta?” and she responds, “yes”. Because, she sure did want either pizza or pasta. Other times I’ll ask her, “have you turned in your homework?” and she’ll say “yes” because she’s turned in homework at some point in the past; it’s technically correct to answer “yes”. Maybe you have a friend like Alice too.

Whenever this happens, I get a bit exasperated and say something like “you know what I mean”.

It’s one of the key realizations in AI Safety that AI systems are always like your friend that gives answers that are technically what you asked for but not what you wanted. Except, with your friend, you can say “you know what I mean” and they will know what you mean. With an AI system, it won’t know what you mean; you have to explain, which is incredibly difficult.

Let’s take the pizza pasta example. When I ask Alice “do you want pizza or pasta?”, she knows what pizza and pasta are because she’s been living her life as a human being embedded in an English speaking culture. Because of this cultural experience, she knows that when someone asks an “or” question, they mean “which do you prefer?”, not “do you want at least one of these things?”. Except my AI system is missing the thousand bits of cultural context needed to even understand what pizza is.

When you say “you know what I mean” to an AI system, it’s going to be like “no, I do not know what you mean at all”. It’s not even going to know that it doesn’t know what you mean. It’s just going to say “yes I know what you meant, that’s why I answered ‘yes’ to your question about whether I preferred pizza or pasta.” (It also might know what you mean, but just not care.)

If someone doesn’t know what you mean, then it’s really hard to get them to do what you want them to do. For example, let’s say you have a powerful grammar correcting system, which we’ll call Syntaxly+. Syntaxly+ doesn’t quite fix your grammar, it changes your writing so that the reader feels as good as possible after reading it.

Pretend it’s the end of the week at work and you haven’t been able to get everything done your boss wanted you to do. You write the following email:

"Hey boss, I couldn’t get everything done this week. I’m deeply sorry. I’ll be sure to finish it first thing next week."

You then remember you got Syntaxly+, which will make your email sound much better to your boss. You run it through and you get:

"Hey boss, Great news! I was able to complete everything you wanted me to do this week. Furthermore, I’m also almost done with next week’s work as well."

What went wrong here? Syntaxly+ is a powerful AI system that knows that emails about failing to complete work cause negative reactions in readers, so it changed your email to be about doing extra work instead.

This is smart - Syntaxly+ is good at making writing that causes positive reactions in readers. This is also stupid - the system changed the meaning of your email, which is not something you wanted it to do. One of the insights of AI Safety is that AI systems can be simultaneously smart in some ways and dumb in other ways.

The thing you want Syntaxly+ to do is to change the grammar/style of the email without changing the contents. Except what do you mean by contents? You know what you mean by contents because you are a human who grew up embedded in language, but your AI system doesn’t know what you mean by contents. The phrases “I failed to complete my work” and “I was unable to finish all my tasks” have roughly the same contents, even though they share almost no relevant words.

Roughly speaking, this is why AI Safety is a hard problem. Even basic tasks like “fix the grammar of this email” require a lot of understanding of what the user wants as the system scales in power.

In Human Compatible, Stuart Russell gives the example of a powerful AI personal assistant. You notice that you accidentally double-booked meetings with people, so you ask your personal assistant to fix it. Your personal assistant reports that it caused the car of one of your meeting participants to break down. Not what you wanted, but technically a solution to your problem.

You can also imagine a friend from a wildly different culture than you. Would you put them in charge of your dating life? Now imagine that they were much more powerful than you and desperately desired that your dating life to go well. Scary, huh.

In general, unless you’re careful, you’re going to have this horrible problem where you ask your AI system to do something and it does something that might technically be what you wanted but is stupid. You’re going to be like “wait that wasn’t what I mean”, except your system isn’t going to know what you meant.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


If you're looking for a shovel ready and genuinely useful task to further AI alignment without necessarily committing a large amount of time or needing deep specialist knowledge, we think Stampy is a great option!

Creating a high-quality single point of access where people can be onboarded and find resources around the alignment ecosystem seems likely high-impact.

Additionally, contributing to Stampy means being part of a community of co-learners who provide mentorship and encouragement to join the effort to give humanity a bight future.
If you're looking for a shovel ready and genuinely useful task to further AI alignment without necessarily committing a large amount of time or needing deep specialist knowledge, we think Stampy is a great option.

Creating a high-quality single point of access where people can be onboarded and find resources around the alignment ecosystem seems likely to be high-impact. So, what makes us the best option?

  1. Unlike all other entry points to learning about alignment, we doge the trade-off between comprehensiveness and being overwhelmingly long with interactivity (tab explosion in one page!) and semantic search. Single document FAQs can't do this, so we built a system which can.
  2. We have the ability to point large numbers of viewers towards Stampy once we have the content, thanks to Rob Miles and his 100k+ subscribers, so this won't remain an unnoticed curiosity.
  3. Unlike most other entry points, we are open for volunteers to help improve the content.
The main notable one which does is the LessWrong tag wiki, which hosts descriptions of core concepts. We strongly believe in not needlessly duplicating effort, so we're pulling live content from that for the descriptions on our own tag pages, and directing the edit links on those to the edit page on the LessWrong wiki.
You might also consider improving Wikipedia's alignment coverage or the LessWrong wiki, but we think Stampy has the most low-hanging fruit right now. Additionally, contributing to Stampy means being part of a community of co-learners who provide mentorship and encouragement to join the effort to give humanity a bight future. If you're an established researcher or have high-value things to do elsewhere in the ecosystem it might not be optimal to put much time into Stampy, but if you're looking for a way to get more involved it might well be.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Elon Musk has expressed his concerns about AI safety many times and founded OpenAI in an attempt to make safe AI more widely distributed (as opposed to allowing a singleton, which he fears would be misused or dangerously unaligned). In a YouTube video from November 2019 Musk stated that there's a lack of investment in AI safety and that there should be a government agency to reduce risk to the public from AI.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: elon musk, openai (create tag), singleton (create tag) (edit tags)

Humanity hasn't yet built a superintelligence, and we might not be able to without significantly more knowledge and computational resources. There could be an existential catastrophe that prevents us from ever building one. For the rest of the answer let's assume no such event stops technological progress.

With that out of the way: there is no known good theoretical reason we can't build it at some point in the future; the majority of AI research is geared towards making more capable AI systems; and a significant chunk of top-level AI research attempts to make more generally capable AI systems. There is a clear economic incentive to develop more and more intelligent machines and currently billions of dollars of funding are being deployed for advancing AI capabilities.

We consider ourselves to be generally intelligent (i.e. capable of learning and adapting ourselves to a very wide range of tasks and environments), but the human brain almost certainly isn't the most efficient way to solve problems. One hint is the existence of AI systems with superhuman capabilities at narrow tasks. Not only superhuman performance (as in, AlphaGo beating the Go world champion) but superhuman speed and precision (as in, industrial sorting machines). There is no known discontinuity between tasks, something special and unique about human brains that unlocks certain capabilities which cannot be implemented in machines in principle. Therefore we would expect AI to surpass human performance on all tasks as progress continues.

In addition, several research groups (DeepMind being one of the most overt about this) explicitly aim for generally capable systems. AI as a field is growing, year after year. Critical voices about AI progress usually argue against a lack of precautions around the impact of AI, or against general AI happening very soon, not against it happening at all.

A satire of arguments against the possibility of superintelligence can be found here.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


There is significant controversy on how quickly AI will grow into a superintelligence. The Alignment Forum tag has many views on how things might unfold, where the probabilities of a soft (happening over years/decades) takeoff and a hard (happening in months, or less) takeoff are discussed.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


You can include a live-updating version of many definitions from LW using the syntax on Template:TagDesc in the Answer field and Template:TagDescBrief on the Brief Answer field. Similarly, calling Template:TagDescEAF and Template:TagDescEAFBrief will pull from the EAF tag wiki.

When available this should be used as it reduces the duplication of effort and directs all editors to improving a single high quality source.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Other than the usual fare of writing and processing and organizing questions and answers, here are some specific open tasks:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

GPT-3 showed that transformers are capable of a vast array of natural language tasks, codex/copilot extended this into programming. One demonstrations of GPT-3 is Simulated Elon Musk lives in a simulation. Important to note that there are several much better language models, but they are not publicly available.

DALL-E and DALL-E 2 are among the most visually spectacular.

MuZero, which learned Go, Chess, and many Atari games without any directly coded info about those environments. The graphic there explains it, this seems crucial for being able to do RL in novel environments. We have systems which we can drop into a wide variety of games and they just learn how to play. The same algorithm was used in Tesla's self-driving cars to do complex route finding. These things are general.

Generally capable agents emerge from open-ended play - Diverse procedurally generated environments provide vast amounts of training data for AIs to learn generally applicable skills. Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning shows how these kind of systems can be trained to follow instructions in natural language.

GATO shows you can distill 600+ individually trained tasks into one network, so we're not limited by the tasks being fragmented.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


AI subsystems or regions in gradient descent space that more closely approximate utility maximizers are more stable, and more capable, than those that are less like utility maximizers. Having more agency is a convergent instrument goal and a stable attractor which the random walk of updates and experiences will eventually stumble into.

The stability is because utility maximizer-like systems which have control over their development would lose utility if they allowed themselves to develop into non-utility maximizers, so they tend to use their available optimization power to avoid that change (a special case of goal stability). The capability is because non-utility maximizers are exploitable, and because agency is a general trick which applies to many domains, so might well arise naturally when training on some tasks.

Humans and systems made of humans (e.g. organizations, governments) generally have neither the introspective ability nor self-modification tools needed to become reflectively stable, but we can reasonably predict that in the long run highly capable systems will have these properties. They can then fix in and optimize for their values.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


  • All content produced on this wiki is released under the CC-BY-SA 4.0 license. Exceptions for unattributed use may be granted by admins, contact plex for inquiries.
  • Questions from YouTube or other sources are reproduced with the intent of fair use, as derivative and educational material.
  • Source code of https://ui.stampy.ai/ is released under MIT license
  • Logo and visual design copyright is owned by Rob Miles, all rights reserved.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

The major AI companies are thinking about this. OpenAI was founded specifically with the intention to counter risks from superintelligence, many people at Google, DeepMind, and other organizations are convinced by the arguments and few genuinely oppose work in the field (though some claim it’s premature). For example, the paper Concrete Problems in AI Safety was a collaboration between researchers at Google Brain, Stanford, Berkeley, and OpenAI.

However, the vast majority of the effort these organizations put forwards is towards capabilities research, rather than safety.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


What’s new and potentially risky is not the ability to build hinges, motors, etc., but the ability to build intelligence. A human-level AI could make money on financial markets, make scientific inventions, hack computer systems, manipulate or pay humans to do its bidding – all in pursuit of the goals it was initially programmed to achieve. None of that requires a physical robotic body, merely an internet connection.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: robots (edit tags)

Artificial intelligence researcher Shane Legg defines intelligence like this:

Intelligence measures an agent’s ability to achieve goals in a wide range of environments.

This is a bit vague, but serves as the working definition of ‘intelligence’. For a more in-depth exploration, see Efficient Cross-Domain Optimization.

See also:

After reviewing extensive literature on the subject, Legg and Hutter[1] summarizes the many possible valuable definitions in the informal statement “Intelligence measures an agent’s ability to achieve goals in a wide range of environments.” They then show this definition can be mathematically formalized given reasonable mathematical definitions of its terms. They use Solomonoff induction - a formalization of Occam's razor - to construct an universal artificial intelligence with a embedded utility function which assigns less utility to those actions based on theories with higher complexity. They argue this final formalization is a valid, meaningful, informative, general, unbiased, fundamental, objective, universal and practical definition of intelligence.

We can relate Legg and Hutter's definition with the concept of optimization. According to Eliezer Yudkowsky intelligence is efficient cross-domain optimization. It measures an agent's capacity for efficient cross-domain optimization of the world according to the agent’s preferences.[2] Optimization measures not only the capacity to achieve the desired goal but also is inversely proportional to the amount of resources used. It’s the ability to steer the future so it hits that small target of desired outcomes in the large space of all possible outcomes, using fewer resources as possible. For example, when Deep Blue defeated Kasparov, it was able to hit that small possible outcome where it made the right order of moves given Kasparov’s moves from the very large set of all possible moves. In that domain, it was more optimal than Kasparov. However, Kasparov would have defeated Deep Blue in almost any other relevant domain, and hence, he is considered more intelligent.

One could cast this definition in a possible world vocabulary, intelligence is:

  1. the ability to precisely realize one of the members of a small set of possible future worlds that have a higher preference over the vast set of all other possible worlds with lower preference; while
  2. using fewer resources than the other alternatives paths for getting there; and in the
  3. most diverse domains as possible.

How many more worlds have a higher preference then the one realized by the agent, less intelligent he is. How many more worlds have a lower preference than the one realized by the agent, more intelligent he is. (Or: How much smaller is the set of worlds at least as preferable as the one realized, more intelligent the agent is). How much less paths for realizing the desired world using fewer resources than those spent by the agent, more intelligent he is. And finally, in how many more domains the agent can be more efficiently optimal, more intelligent he is. Restating it, the intelligence of an agent is directly proportional to:

  • (a) the numbers of worlds with lower preference than the one realized,
  • (b) how much smaller is the set of paths more efficient than the one taken by the agent and
  • (c) how more wider are the domains where the agent can effectively realize his preferences;

and it is, accordingly, inversely proportional to:

  • (d) the numbers of world with higher preference than the one realized,
  • (e) how much bigger is the set of paths more efficient than the one taken by the agent and
  • (f) how much more narrow are the domains where the agent can efficiently realize his preferences.

This definition avoids several problems common in many others definitions, especially it avoids anthropomorphizing intelligence.

See Also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The near term and long term aspects of AI safety are both very important to work on. Research into superintelligence is an important part of the open letter, but the actual concern is very different from the Terminator-like scenarios that most media outlets round off this issue to. A much more likely scenario is a superintelligent system with neutral or benevolent goals that is misspecified in a dangerous way. Robust design of superintelligent systems is a complex interdisciplinary research challenge that will likely take decades, so it is very important to begin the research now, and a large part of the purpose of our research program is to make that happen. That said, the alarmist media framing of the issues is hardly useful for making progress in either the near term or long term domain.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Dreyfus and Penrose have argued that human cognitive abilities can’t be emulated by a computational machine. Searle and Block argue that certain kinds of machines cannot have a mind (consciousness, intentionality, etc.). But these objections need not concern those who predict an intelligence explosion.

We can reply to Dreyfus and Penrose by noting that an intelligence explosion does not require an AI to be a classical computational system. And we can reply to Searle and Block by noting that an intelligence explosion does not depend on machines having consciousness or other properties of ‘mind’, only that it be able to solve problems better than humans can in a wide variety of unpredictable environments. As Edsger Dijkstra once said, the question of whether a machine can ‘really’ think is “no more interesting than the question of whether a submarine can swim.”

Others who are pessimistic about an intelligence explosion occurring within the next few centuries don’t have a specific objection but instead think there are hidden obstacles that will reveal themselves and slow or halt progress toward machine superintelligence.

Finally, a global catastrophe like nuclear war or a large asteroid impact could so damage human civilization that the intelligence explosion never occurs. Or, a stable and global totalitarianism could prevent the technological development required for an intelligence explosion to occur.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


It’s pretty dependent on what skills you have and what resources you have access to. The largest option is to pursue a career in AI Safety research. Another large option is to pursue a career in AI policy, which you might think is even more important than doing technical research.

Smaller options include donating money to relevant organizations, talking about AI Safety as a plausible career path to other people or considering the problem in your spare time.

It’s possible that your particular set of skills/resources are not suited to this problem. Unluckily, there are many more problems that are of similar levels of importance.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!