Main Page

From Stampy's Wiki
Welcome to Stampy's Wiki, the editor's hub for an open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do the kinds of things we would like them to do.

We're also building Stampy the AI Safety Bot, who will soon be capable of using the FAQ and other sources to educate people about AI alignment via an interactive natural language interface. Feel free to get involved as an early contributor!


These are unanswered questions which we've reviewed and decided are within Stampy's scope. Feel free to answer them if you want to help out. Your answers will be reviewed, stamped, and possibly improved by others so don't worry about them not being perfect :)

How likely is an intelligence explosion?

If AI takes over the world, how could it create and maintain its hardware, its power supply and everything else that humans currently provide?

What is narrow AI?
Isn't the real concern autonomous weapons?

See more...

Answers which have lots of stamps are displayed on this tab. Browse tags or good questions

This is actually an active area of AI alignment research, called "Impact Measures"! It's not trivial to formalize in a way which won't predictably go wrong (entropy minimization likely leads to an AI which tries really hard to put out all the stars ASAP since they produce so much entropy, for example), but progress is being made. You can read about it on the Alignment Forum tag, or watch Rob's videos Avoiding Negative Side Effects and Avoiding Positive Side Effects

Once an AGI has access to the internet it would be very challenging to meaningfully restrict it from doing things online which it wants to. There are too many options to bypass blocks we may put in place.

It may be possible to design it so that it does not want to do dangerous things in the first place, or perhaps to set up tripwires so that we notice that it’s trying to do a dangerous thing, though that relies on it not noticing or bypassing the tripwire so should not be the only layer of security.

If the AI system was deceptively aligned (i.e. pretending to be nice until it was in control of the situation) or had been in stealth mode while getting things in place for a takeover, quite possibly within hours. We may get more warning with weaker systems, if the AGI does not feel at all threatened by us, or if a complex ecosystem of AI systems is built over time and we gradually lose control.

Paul Christiano writes a story of alignment failure which shows a relatively fast transition.

AI Safety Support offers free calls to advise people interested in a career in AI Safety, so that's a great place to start. We're working on creating a bunch of detailed information for Stampy to use, but in the meantime check out these resources:

How doomed is humanity?

The opinions from experts are all over the place, according to this 2021 survey. Someone has also collected a database of existential risk estimates.

On the pessimistic end you find people like Eliezer Yudkowsky, who said: "I consider the present gameboard to look incredibly grim, and I don't actually see a way out through hard work alone. We can hope there's a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle; preparing for an unknown miracle probably looks like "Trying to die with more dignity on the mainline" (because if you can die with more dignity on the mainline, you are better positioned to take advantage of a miracle if it occurs)."

While at the optimistic end you have people like Ben Garfinkel who put the probability at more like 0.1-1% for AI causing an existential catastrophe in the next century, with most people lying somewhere in the middle.

See more...

Here are the questions which have been added to the wiki but not yet sorted through. Click below to see instructions for processing them.

To process a question:

  1. Check whether the question is in Stampy's scope (cheatsheet: is it likely to be helpful to
    somei.e. more than just one person with an extremely specific question, such as five paragraphs describing a suggested method for aligning AGI.
    people learning about about AI existential safety). If it's not in scope, mark as 'Out of scope' and move on to the next question.
  2. Check whether there are any existing canonical questions which are duplicates of it by looking at likely tags or running ctrl-f with some plausible words over the list of all canonical questions. If you find a
    duplicateAn existing question is a duplicate of a new one if it is reasonable to expect whoever asked the new question to be satisfied if they received an answer to the existing question instead.
    , click 'Mark as duplicate' and add the name of duplicate to that page, else continue.
  3. Add some relevant tags to the page using the add/edit tags link.
  4. Consider whether the question could be rephrased to be clearer, then if you think of a better name change it by clicking edit then opening the advanced options and writing a new title in the top field.
  5. Mark the question as the appropriate review level (tip: hover over the buttons for descriptions of what each level means).
  6. (optional) Add the question as a related question to some existing questions (either from tag pages, the full list, or memory) and vice versa, to make it more discoverable. Click edit on the question to find the related questions field.
  7. Finally, mark the question as canonical, and bask in the joy of a slightly more organized wiki and a tiny expected increase in the chances of humanity's future going well (hopefully).

The question will now show up on the list of questions we want answered, and the main page.

For sorting incoming YouTube questions, see Prioritize YouTube questions.

There are 3 incoming questions to sort!

Will superintelligence make a large part of humanity unemployable?
Some economists say human wants are infinite, and there will always be new and currently unimaginable kinds of jobs for people to do.
Others say this won't be true if AGI can do _anything_ human minds can do.

Mark as:

Tags: automation, technological unemployment (create tag) (edit tags)

What is a verified account on Stampy's Wiki?

Mark as:

Tags: stampy (edit tags)

Can we add friendliness to any artificial intelligence design?

Mark as:

Tags: friendly ai (create tag) (edit tags)

If you think an answer is good (i.e. accurate, helpful, and replies well to the question) then please give it a stamp of approval. If you're a more experienced editor and have a good feel for when there is consensus around an answer being accepted, feel free to mark answers as canonical, so that Stampy will start serving them to users.

The below list puts most recent answers at the top, with alternate sortings (e.g. low stamps, untagged, potentially canonical, wants brief) available on the review answers page.

If I know how to use formal proof assistants (e.g. Mathcomp/Ssreflect), could this be useful in helping with well-aligned AGI? Is computer-certified proof something we might want?


80k links to an article on high impact careers in formal verification in the few paragraphs they've written about formal verification.
80k links to an article on high impact careers in formal verification in the few paragraphs they've written about formal verification.

Some other notes

  • https://github.com/deepmind/cartesian-frames I emailed Scott about doing this in coq before this repo was published and he said "I wouldn't personally find such a software useful but sounds like a valuable exercise for the implementer" or something like this.
  • When I mentioned the possibility of rolling some of infrabayesianism in coq to diffractor he wasn't like "omg we really need someone to do that" he was just like "oh that sounds cool" -- I never got around to it, if I would I'd talk to vanessa and diffractor about weakening/particularizing stuff beforehand.
  • if you extrapolate a pattern from those two examples, you start to think that agent foundations is the principle area of interest with proof assistants! and again- does the proof assistant exercise advance the research or provide a nutritious exercise to the programmer?
  • A sketch of a more prosaic scenario in which proof assistants play a role is "someone proposes isInnerAligned : GradientDescent -> Prop and someone else implements a galaxybrained new type theory/tool in which gradient descent is a primitive (whatever that means)", when I mentioned this scenario to Buck he said "yeah if that happened I'd direct all the engineers at redwood to making that tool easier to use", when I mentioned that scenario to Evan about a year ago he said didn't seem to think it was remotely plausible. probably a nonstarter.
Stamps: None


AI alignment encompasses everything related to solving the problem of how to build artificial intelligences such that they share the values of their creators. This appears to be very hard and may be the most important problem humanity will ever face, as whether we succeed might mean the difference between extinction and humanity flourishing until the end of the universe.

Stamps: plex


A value handshake is a form of trade between superintelligences, when two AI's with incompatible utility functions meet, instead of going to war, since they have superhuman prediction abilities and likely know the outcome before any attack even happens, they can decide to split the universe into chunks with volumes according to their respective military strength or chance of victory, and if their utility functions are compatible, they might even decide to merge into an AI with an utility function that is the weighted average of the two previous ones.

This could happen if multiple AI's are active on earth at the same time, and then maybe if at least one of them is aligned with humans, the resulting value handshake could leave humanity in a pretty okay situation.

See The Hour I First Believed By Scott Alexander for some further thoughts and an introduction to related topics.

Stamps: plex

Tags: None (add tags)

You show stamp an answer when you think it is accurate and well presented enough that you'd be happy to see it served to readers by Stampy.

Stamps: None

Tags: stampy (edit tags)

"The real concern" isn't a particularly meaningful concept here. Deep learning has proven to be a very powerful technology, with far reaching implications across a number of aspects of human existence. There are significant benefits to be found if we manage the technology properly, but that management means addressing a broad range of concerns, one of which is the alignment problem.

Stamps: None


See more...

Changes daily. See improve answers for variations. These 2 canonical answers don't have any tags, please add some!

The basic concern as AI systems become increasingly powerful is that they won’t do what we want them to do – perhaps because they aren’t correctly designed, perhaps because they are deliberately subverted, or perhaps because they do what we tell them to do rather than what we really want them to do (like in the classic stories of genies and wishes.) Many AI systems are programmed to have goals and to attain them as effectively as possible – for example, a trading algorithm has the goal of maximizing profit. Unless carefully designed to act in ways consistent with human values, a highly sophisticated AI trading system might exploit means that even the most ruthless financier would disavow. These are systems that literally have a mind of their own, and maintaining alignment between human interests and their choices and actions will be crucial.

Stamps: plex

Tags: None (add tags)

Ajeya Cotra has written an excellent article named Why AI alignment could be hard with modern deep learning on this question.

Stamps: plex

Tags: None (add tags)