Wants related

From Stampy's Wiki

Back to Improve answers.

These 132 canonical answers have no related or follow-up questions, so don't offer more to explore as a reader selects it. Feel free to add some! See a full list of available canonical questions or browse by tags.

The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.

The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.

Some effective altruists, including Toby Ord and William MacAskill, have argued that, if humanity succeeds in eliminating existential risk or reducing it to acceptable levels, it should not immediately embark on an ambitious and potentially irreversible project of arranging the universe's resources in accordance to its values, but ought instead to spend considerable time— "centuries (or more)";[1] "perhaps tens of thousands of years";[2] "thousands or millions of years";[3] "[p]erhaps... a million years"[4]—figuring out what is in fact of value. The long reflection may thus be seen as an intermediate stage in a rational long-term human developmental trajectory, following an initial stage of existential security when existential risk is drastically reduced and followed by a final stage when humanity's potential is fully realized.[5]


The idea of a long reflection has been criticized on the grounds that virtually eliminating all existential risk will almost certainly require taking a variety of large-scale, irreversible decisions—related to space colonization, global governance, cognitive enhancement, and so on—which are precisely the decisions meant to be discussed during the long reflection.[6][7] Since there are pervasive and inescapable tradeoffs between reducing existential risk and retaining moral option value, it may be argued that it does not make sense to frame humanity's long-term strategic picture as one consisting of two distinct stages, with one taking precedence over the other.

Further reading

Aird, Michael (2020) Collection of sources that are highly relevant to the idea of the Long Reflection, Effective Altruism Forum, June 20.
Many additional resources on this topic.

Wiblin, Robert & Keiran Harris (2018) Our descendants will probably see us as moral monsters. what should we do about that?, 80,000 Hours, January 19.
Interview with William MacAskill about the long reflection and other topics.

Related entries

dystopia | existential risk | existential security | long-term future | longtermism | longtermist institutional reform | moral uncertainty | normative ethics | value lock-in

  1. Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.

  2. Greaves, Hilary et al. (2019) A research agenda for the Global Priorities Institute, Oxford.

  3. Dai, Wei (2019) The argument from philosophical difficulty, LessWrong, February 9.

  4. William MacAskill, in Perry, Lucas (2018) AI alignment podcast: moral uncertainty and the path to AI alignment with William MacAskill, AI Alignment podcast, September 17.

  5. Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.

  6. Stocker, Felix (2020) Reflecting on the long reflection, Felix Stocker’s Blog, August 14.

  7. Hanson, Robin (2021) ‘Long reflection’ is crazy bad idea, Overcoming Bias, October 20.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. However, human intelligence continues to dominate machine intelligence in generality.

A powerful chess computer is “narrow”: it can’t play other games. In contrast, humans have problem-solving abilities that allow us to adapt to new contexts and excel in many domains other than what the ancestral environment prepared us for.

In the absence of a formal definition of “intelligence” (and therefore of “artificial intelligence”), we can heuristically cite humans’ perceptual, inferential, and deliberative faculties (as opposed to, e.g., our physical strength or agility) and say that intelligence is “those kinds of things.” On this conception, intelligence is a bundle of distinct faculties — albeit a very important bundle that includes our capacity for science.

Our cognitive abilities stem from high-level patterns in our brains, and these patterns can be instantiated in silicon as well as carbon. This tells us that general AI is possible, though it doesn’t tell us how difficult it is. If intelligence is sufficiently difficult to understand, then we may arrive at machine intelligence by scanning and emulating human brains or by some trial-and-error process (like evolution), rather than by hand-coding a software agent.

If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond qualitative improvements in cognition, Nick Bostrom notes more straightforward advantages we could realize in digital minds, e.g.:

  • editability — “It is easier to experiment with parameter variations in software than in neural wetware.”
  • speed — “The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.”
  • serial depth — On short timescales, machines can carry out much longer sequential processes.
  • storage capacity — Computers can plausibly have greater working and long-term memory.
  • size — Computers can be much larger than a human brain.
  • duplicability — Copying software onto new hardware can be much faster and higher-fidelity than biological reproduction.

Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

We could shut down weaker systems, and this would be a useful guardrail against certain types of problem caused by narrow AI. However, once an AGI establishes itself, we could not unless it was corrigible and willing to let humans adjust it. There may be a period in the early stages of an AGI's development where it would be trying very hard to convince us that we should not shut it down and/or hiding itself and/or recursively self-improving and/or making copies of itself onto every server on earth.

Instrumental Convergence and the Stop Button Problem are the key reasons it would not be simple to shut down a non corrigible advanced system. If the AI wants to collect stamps, being turned off means it gets less stamps, so even without an explicit goal of not being turned off it has an instrumental reason to avoid being turned off (e.g. once it acquires a detailed world model and general intelligence, it is likely to realise that by playing nice and pretending to be aligned if you have the power to turn it off, establishing control over any system we put in place to shut it down, and eliminating us if it has the power to reliably do so and we would otherwise pose a threat).

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

David runs a lab at the University of Cambridge. Some things he is working on include:

  1. Operationalizing inner alignment failures and other speculative alignment failures that haven't actually been observed.
  2. Understanding neural network generalization.

For work done on (1), see: Goal Misgeneralization, a paper that empirically demonstrated examples of inner alignment failure in Deep RL environments. For example, they trained an agent to get closer to cheese in a maze, but where the cheese was always in the top right of a maze in the training set. During test time, when presented with cheese elsewhere, the RL agent navigated to the top right instead of to the cheese: it had learned the mesa objective of "go to the top right".

For work done on (2), see OOD Generalization via Risk Extrapolation, an iterative improvement on robustness to previous methods.

We've not read about his motivation is for these specific research directions, but these are likely his best starts on how to solve the alignment problem.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Until a thing has happened, it has never happened. We have been consistently improving both the optimization power and generality of our algorithms over that time period, and have little reason to expect it to suddenly stop. We’ve gone from coding systems specifically for a certain game (like Chess), to algorithms like MuZero which learn the rules of the game they’re playing and how to play at vastly superhuman skill levels purely via self-play across a broad range of games (e.g. Go, chess, shogi and various Atari games).

Human brains are a spaghetti tower generated by evolution with zero foresight, it would be surprising if they are the peak of physically possible intelligence. The brain doing things in complex ways is not strong evidence that we need to fully replicate those interactions if we can throw sufficient compute at the problem, as explained in Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain.

It is, however, plausible that for an AGI we need a lot more compute than we will get in the near future, or that some key insights are missing which we won’t get for a while. The OpenPhilanthropy report on how much computational power it would take to simulate the brain is the most careful attempt at reasoning out how far we are from being able to do it, and suggests that by some estimates we already have enough computational resources, and by some estimates moore’s law may let us reach it before too long.

It also seems that much of the human brain exists to observe and regulate our biological body, which a body-less computer wouldn't need. If that's true, then a human-level AI might be possible with considerably less compute than the human brain.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Stampy is focused specifically on AI existential safety (both introductory and technical questions), but does not aim to cover general AI questions or other topics which don't interact strongly with the effects of AI on humanity's long-term future. More technical questions are also in our scope, though replying to all possible proposals is not feasible and this is not a place to submit detailed ideas for evaluation.

We are interested in:

  • Introductory questions closely related to the field e.g.
    • "How long will it be until transformative AI arrives?"
    • "Why might advanced AI harm humans?"
  • Technical questions related to the field e.g.
    • "What is Cooperative Inverse Reinforcement Learning?"
    • "What is Logical Induction useful for?"
  • Questions about how to contribute to the field e.g.
    • "Should I get a PhD?"
    • "Where can I find relevant job opportunities?"

More good examples can be found at canonical questions.

We do not aim to cover:

  • Aspects of AI Safety or fairness which are not strongly relevant to existential safety e.g.
    • "How should self-driving cars weigh up moral dilemmas"
    • "How can we minimize the risk of privacy problems caused by machine learning algorithms?"
  • Extremely specific and detailed questions the answering of which is unlikely to be of value to more than a single person e.g.
    • "What if we did <multiple paragraphs of dense text>? Would that result in safe AI?"

We will generally not delete out-of-scope content, but it will be reviewed as low priority to answer, not be marked as a canonical question, and not be served to readers by on Stampy's UI.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

Debate is a proposed technique for allowing human evaluators to get correct and helpful answers from experts, even if the evaluator is not themselves an expert or able to fully verify the answers.[1] The technique was suggested as part of an approach to build advanced AI systems that are aligned with human values, and to safely apply machine learning techniques to problems that have high stakes, but are not well-defined (such as advancing science or increase a company's revenue). [2][3]

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, debate (create tag) (edit tags)

In previous decades, AI research had proceeded more slowly than some experts predicted. According to experts in the field, however, this trend has reversed in the past 5 years or so. AI researchers have been repeatedly surprised by, for example, the effectiveness of new visual and speech recognition systems. AI systems can solve CAPTCHAs that were specifically devised to foil AIs, translate spoken text on-the-fly, and teach themselves how to play games they have neither seen before nor been programmed to play. Moreover, the real-world value of this effectiveness has prompted massive investment by large tech firms such as Google, Facebook, and IBM, creating a positive feedback cycle that could dramatically speed progress.

Stamps: Sophialb, plex
Show your endorsement of this answer by giving it a stamp of approval!

Current narrow systems are much more domain-specific than AGI. We don’t know what the first AGI will look like, some people think the GPT-3 architecture but scaled up a lot may get us there (GPT-3 is a giant prediction model which when trained on a vast amount of text seems to learn how to learn and do all sorts of crazy-impressive things, a related model can generate pictures from text), some people don’t think scaling this kind of model will get us all the way.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

There is significant controversy on how quickly AI will grow into a superintelligence. The Alignment Forum tag has many views on how things might unfold, where the probabilities of a soft (happening over years/decades) takeoff and a hard (happening in months, or less) takeoff are discussed.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

The paper Risks from Learned Optimization in Advanced Machine Learning Systems makes the distinction between inner and outer alignment: Outer alignment means making the optimization target of the training process (“outer optimization target” e.g. the loss in supervised learning) aligned with what we want. Inner alignment means making the optimization target of the trained system (“inner optimization target”) aligned with the outer optimization target. A challenge here is that the inner optimization target does not have an explicit representation in current systems, and can differ very much from the outer optimization target (see for example Goal Misgeneralization in Deep Reinforcement Learning).

See also this article for an intuitive explanation of inner and outer alignment.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

It depends on what is meant by advanced. Many AI systems which are very effective and advanced narrow intelligences would not try to upgrade themselves in an unbounded way, but becoming smarter is a convergent instrumental goal so we could expect most AGI designs to attempt it.

The problem is that increasing general problem solving ability is climbing in exactly the direction needed to trigger an intelligence explosion, while generating large economic and strategic payoffs to whoever achieves them. So even though we could, in principle, just not build the kind of systems which would recursively self-improve, in practice we probably will go ahead with constructing them, because they’re likely to be the most powerful.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Sharing high quality information about AI Safety can be one of the lowest effort ways to expose people to the ideas. Be sure to engage with the replies with care and do your research when replying to questions people respond with (feel free to add them to aisafety.info for our team to work on).

Top 3:

  • Introduction to AI safety by Robert Miles

  • Rational animations

  • Article from Vox

The case for taking AI seriously as a threat to humanity

For Machine Learning researchers:




Online communities:

Reading lists:

Discussion Groups/Forums:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: communication (create tag) (edit tags)

Scaling laws are observed trends on the performance of large machine learning models.

In the field of ML, better performance is usually achieved through better algorithms, better inputs, or using larger amounts of parameters, computing power, or data. Since the 2010s, advances in deep learning have shown experimentally that the easier and faster returns come from scaling, an observation that has been described by Richard Sutton as the bitter lesson.

While deep learning as a field has long struggled to scale models up while retaining learning capability (with such problems as catastrophic interference), more recent methods, especially the Transformer model architecture, were able to just work by feeding them more data, and as the meme goes, stacking more layers.

More surprisingly, performance (in terms of absolute likelihood loss, a standard measure) appeared to increase smoothly with compute, or dataset size, or parameter count. Which gave rise to scaling laws, the trend lines suggested by performance gains, from which returns on data/compute/time investment could be extrapolated.

A companion to this purely descriptive law (no strong theoretical explanation of the phenomenon has been found yet), is the scaling hypothesis, which Gwern Branwen describes:

The strong scaling hypothesis is that, once we find a scalable architecture like self-attention or convolutions, [...] we can simply train ever larger [neural networks] and ever more sophisticated behavior will emerge naturally as the easiest way to optimize for all the tasks & data.

The scaling laws, if the above hypothesis holds, become highly relevant to safety insofar capability gains become conceptually easier to achieve: no need for clever designs to solve a given task, just throw more processing at it and it will eventually yield. As Paul Christiano observes:

It now seems possible that we could build “prosaic” AGI, which can replicate human behavior but doesn’t involve qualitatively new ideas about “how intelligence works”.

While the scaling laws still hold experimentally at the time of this writing (July 2022), whether they'll continue up to safety-relevant capabilities is still an open problem.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Pivotal acts are acts that substantially change the direction humanity will have taken in 1 billion years. The term is used to denote positive changes, as opposed to existential catastrophe.

An obvious pivotal act would be to create a sovereign AGI aligned with humanity's best interests. An act that would greatly increase the chance of another pivotal act would also count as pivotal.

Pivotal acts often lay outside the Overton window. One such example is stopping or strongly delaying the development of an unaligned (or any) AGI through drastic means such as nanobots which melt all advanced processors, or the disabling of all AI researchers. Eliezer mentions these in AGI Ruin: A List of Lethalities. Andrew Critch argues against such an unilateral pivotal act in “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments.

For more details, see arbital.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

The safety team at OpenAI's plan is to build a MVP aligned AGI to try and help us solve the full alignment problem.

They want to do this with Reinforcement Learning from Human Feedback (RLHF): get feedback from humans about what is good, i.e. give reward to AI's based on the human feedback. Problem: what if the AI makes gigabrain 5D chess moves that humans don't understand, so can't evaluate. Jan Leike, the director of the safety team, views this (the informed oversight problem) as the core difficulty of alignment. Their proposed solution: an AI assisted oversight scheme, with a recursive hierarchy of AIs bottoming out at humans. They are working on experimenting with this approach by trying to get current day AIs to do useful supporting work such as summarizing books and criticizing itself.

OpenAI also published GPT-3, and are continuing to push LLM capabilities, with GPT-4 expected to be released at some point soon.

See also: Common misconceptions about OpenAI and Our approach to alignment research.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Here we ask about the additional cost of building an aligned powerful system, compare to its unaligned version. We often assume it to be nonzero, in the same way it's easier and cheaper to build an elevator without emergency brakes. This is referred as the alignment tax, and most AI alignment research is geared toward reducing it.

One operational guess by Eliezer Yudkowsky about its magnitude is "[an aligned project will take] at least 50% longer serial time to complete than [its unaligned version], or two years longer, whichever is less". This holds for agents with enough capability that their behavior is qualitatively different from a safety engineering perspective (for instance, an agent that is not corrigible by default).

An essay by John Wentworth argues for a small chance of alignment happening "by default", with an alignment tax of effectively zero.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Using some human-related metaphors (e.g. what an AGI ‘wants’ or ‘believes’) is almost unavoidable, as our language is built around experiences with humans, but we should be aware that these may lead us astray.

Many paths to AGI would result in a mind very different from a human or animal, and it would be hard to predict in detail how it would act. We should not trust intuitions trained on humans to predict what an AGI or superintelligence would do. High fidelity Whole Brain Emulations are one exception, where we would expect the system to at least initially be fairly human, but it may diverge depending on its environment and what modifications are applied to it.

There has been some discussion about how language models trained on lots of human-written text seem likely to pick up human concepts and think in a somewhat human way, and how we could use this to improve alignment.

Stamps: Aprillion
Show your endorsement of this answer by giving it a stamp of approval!

Chris Olah, the interpretability legend, is working on looking really hard at all the neurons to see what they all mean. The approach he pioneered is circuits: looking at computational subgraphs of the network, called circuits, and interpreting those. Idea: "decompiling the network into a better representation that is more interpretable". In-context learning via attention heads, and interpretability here seems useful.

One result I heard about recently: a linear softmax unit stretches space and encourages neuron monosemanticity (making a neuron represent only one thing, as opposed to firing on many unrelated concepts). This makes the network easier to interpret.

Motivation: The point of this is to get as many bits of information about what neural networks are doing, to hopefully find better abstractions. This diagram gets posted everywhere, the hope being that networks, in the current regime, will become more interpretable because they will start to use abstractions that are closer to human abstractions.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: interpretability, anthropic (create tag) (edit tags)

CLR is focused primarily on reducing suffering-risk (s-risk), where the future has a large negative value. They do foundational research in game theory / decision theory, primarily aimed at multipolar AI scenarios. One result relevant to this work is that transparency can increase cooperation.

Update after Jesse Clifton commented: CLR also works on improving coordination for prosaic AI scenarios, risks from malevolent actors and AI forecasting. The Cooperative AI Foundation (CAIF) shares personnel with CLR, but is not formally affiliated with CLR, and does not focus just on s-risks.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

The basic concern as AI systems become increasingly powerful is that they won’t do what we want them to do – perhaps because they aren’t correctly designed, perhaps because they are deliberately subverted, or perhaps because they do what we tell them to do rather than what we really want them to do (like in the classic stories of genies and wishes.) Many AI systems are programmed to have goals and to attain them as effectively as possible – for example, a trading algorithm has the goal of maximizing profit. Unless carefully designed to act in ways consistent with human values, a highly sophisticated AI trading system might exploit means that even the most ruthless financier would disavow. These are systems that literally have a mind of their own, and maintaining alignment between human interests and their choices and actions will be crucial.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

A Superintelligence would be intelligent enough to understand what the programmer’s motives were when designing its goals, but it would have no intrinsic reason to care about what its programmers had in mind. The only thing it will be beholden to is the actual goal it is programmed with, no matter how insane its fulfillment may seem to us.

Consider what “intentions” the process of evolution may have had for you when designing your goals. When you consider that you were made with the “intention” of replicating your genes, do you somehow feel beholden to the “intention” behind your evolutionary design? Most likely you don't care. You may choose to never have children, and you will most likely attempt to keep yourself alive long past your biological ability to reproduce.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

The development team works on multiple projects in support of Stampy. Currently, these projects include:

  • Stampy UI, which is made mostly in TypeScript.
  • The Stampy Wiki, which is made mostly in PHP and JavaScript.
  • The Stampy Bot, which is made in Python.

However, even if you don’t specialize in any of these areas, do reach out if you would like to help.

To join, please contact our Project Manager, plex. You can reach him on discord at plex#1874. He will be able to point your skills in the right direction to help in the most effective way possible.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Intelligence is powerful. Because of superior intelligence, we humans have dominated the Earth. The fate of thousands of species depends on our actions, we occupy nearly every corner of the globe, and we repurpose vast amounts of the world's resources for our own use. Artificial Superintelligence (ASI) has potential to be vastly more intelligent than us, and therefore vastly more powerful. In the same way that we have reshaped the earth to fit our goals, an ASI will find unforeseen, highly efficient ways of reshaping reality to fit its goals.

The impact that an ASI will have on our world depends on what those goals are. We have the advantage of designing those goals, but that task is not as simple as it may first seem. As described by MIRI in their Intelligence Explosion FAQ:

“A superintelligent machine will make decisions based on the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety of what humans value.”

If we do not solve the Control Problem before the first ASI is created, we may not get another chance.

Stamps: Sophialb
Show your endorsement of this answer by giving it a stamp of approval!

Answer questions collects all the questions we definitely want answers to, browse there and see if you know how to answer any of them.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

See more...