Wants related

From Stampy's Wiki

Back to Improve answers.

These 110 canonical answers have no related or follow-up questions, so don't offer more to explore as a reader selects it. Feel free to add some! See a full list of available canonical questions or browse by tags.

The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal, that is, its final goals and intelligence levels can vary independently of each other. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal.

The Orthogonality Thesis states that an agent can have any combination of intelligence level and final goal, that is, its final goals and intelligence levels can vary independently of each other. This is in contrast to the belief that, because of their intelligence, AIs will all converge to a common goal.

The thesis was originally defined by Nick Bostrom in the paper "Superintelligent Will", (along with the instrumental convergence thesis). For his purposes, Bostrom defines intelligence to be instrumental rationality.

Related: Complexity of Value, Decision Theory, General Intelligence, Utility Functions

Defense of the thesis

It has been pointed out that the orthogonality thesis is the default position, and that the burden of proof is on claims that limit possible AIs. Stuart Armstrong writes that,

One reason many researchers assume superintelligent agents to converge to the same goals may be because most humans have similar values. Furthermore, many philosophies hold that there is a rationally correct morality, which implies that a sufficiently rational AI will acquire this morality and begin to act according to it. Armstrong points out that for formalizations of AI such as AIXI and Gödel machines, the thesis is known to be true. Furthermore, if the thesis was false, then Oracle AIs would be impossible to build, and all sufficiently intelligent AIs would be impossible to control.

Pathological Cases

There are some pairings of intelligence and goals which cannot exist. For instance, an AI may have the goal of using as little resources as possible, or simply of being as unintelligent as possible. These goals will inherently limit the degree of intelligence of the AI.

See Also

External links

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


There is a broad range of possible goals that an AI might possess, but there are a few basic drives that would be useful to almost any of them. These are called instrumentally convergent goals:

  1. Self preservation. An agent is less likely to achieve its goal if it is not around to see to its completion.
  2. Goal-content integrity. An agent is less likely to achieve its goal if its goal has been changed to something else. For example, if you offer Gandhi a pill that makes him want to kill people, he will refuse to take it.
  3. Self-improvement. An agent is more likely to achieve its goal if it is more intelligent and better at problem-solving.
  4. Resource acquisition. The more resources at an agent’s disposal, the more power it has to make change towards its goal. Even a purely computational goal, such as computing digits of pi, can be easier to achieve with more hardware and energy.

Because of these drives, even a seemingly simple goal could create an Artificial Superintelligence (ASI) hell-bent on taking over the world’s material resources and preventing itself from being turned off. The classic example is an ASI that was programmed to maximize the output of paper clips at a paper clip factory. The ASI had no other goal specifications other than “maximize paper clips,” so it converts all of the matter in the solar system into paper clips, and then sends probes to other star systems to create more factories.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


AI subsystems or regions in gradient descent space that more closely approximate utility maximizers are more stable, and more capable, than those that are less like utility maximizers. Having more agency is a convergent instrument goal and a stable attractor which the random walk of updates and experiences will eventually stumble into.

The stability is because utility maximizer-like systems which have control over their development would lose utility if they allowed themselves to develop into non-utility maximizers, so they tend to use their available optimization power to avoid that change (a special case of goal stability). The capability is because non-utility maximizers are exploitable, and because agency is a general trick which applies to many domains, so might well arise naturally when training on some tasks.

Humans and systems made of humans (e.g. organizations, governments) generally have neither the introspective ability nor self-modification tools needed to become reflectively stable, but we can reasonably predict that in the long run highly capable systems will have these properties. They can then fix in and optimize for their values.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


It likely will – however, intelligence is, by many definitions, the ability to figure out how to accomplish goals. Even in today’s advanced AI systems, the builders assign the goal but don’t tell the AI exactly how to accomplish it, nor necessarily predict in detail how it will be done; indeed those systems often solve problems in creative, unpredictable ways. Thus the thing that makes such systems intelligent is precisely what can make them difficult to predict and control. They may therefore attain the goal we set them via means inconsistent with our preferences.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: tool ai (edit tags)

A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence.

A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence.

Editor note: there is work to be done reconciling this page, Agency page, and Robust Agents. Currently they overlap and I'm not sure they're consistent. - Ruby, 2020-09-15

More generally, an agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.[1]

There has been much discussion as to whether certain AGI designs can be made into mere tools or whether they will necessarily be agents which will attempt to actively carry out their goals. Any minds that actively engage in goal-directed behavior are potentially dangerous, due to considerations such as basic AI drives possibly causing behavior which is in conflict with humanity's values.

In Dreams of Friendliness and in Reply to Holden on Tool AI, Eliezer Yudkowsky argues that, since all intelligences select correct beliefs from the much larger space of incorrect beliefs, they are necessarily agents.

See also

Posts

  1. Russel, S. & Norvig, P. (2003) Artificial Intelligence: A Modern Approach. Second Edition. Page 32.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, agency (create tag) (edit tags)

Each major organization has a different approach. The research agendas are detailed and complex (see also AI Watch). Getting more brains working on any of them (and more money to fund them) may pay off in a big way, but it’s very hard to be confident which (if any) of them will actually work.

The following is a massive oversimplification, each organization actually pursues many different avenues of research, read the 2020 AI Alignment Literature Review and Charity Comparison for much more detail. That being said:

  • The Machine Intelligence Research Institute focuses on foundational mathematical research to understand reliable reasoning, which they think is necessary to provide anything like an assurance that a seed AI built will do good things if activated.
  • The Center for Human-Compatible AI focuses on Cooperative Inverse Reinforcement Learning and Assistance Games, a new paradigm for AI where they try to optimize for doing the kinds of things humans want rather than for a pre-specified utility function
  • Paul Christano's Alignment Research Center focuses is on prosaic alignment, particularly on creating tools that empower humans to understand and guide systems much smarter than ourselves. His methodology is explained on his blog.
  • The Future of Humanity Institute does work on crucial considerations and other x-risks, as well as AI safety research and outreach.
  • Anthropic is a new organization exploring natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
  • OpenAI is in a state of flux after major changes to their safety team.
  • DeepMind’s safety team is working on various approaches designed to work with modern machine learning, and does some communication via the Alignment Newsletter.
  • EleutherAI is a Machine Learning collective aiming to build large open source language models to allow more alignment research to take place.
  • Ought is a research lab that develops mechanisms for delegating open-ended thinking to advanced machine learning systems.

There are many other projects around AI Safety, such as the Windfall clause, Rob Miles’s YouTube channel, AI Safety Support, etc.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


A potential solution is to create an AI that has the same values and morality as a human by creating a child AI and raising it. There’s nothing intrinsically flawed with this procedure. However, this suggestion is deceptive because it sounds simpler than it is.

If you get a chimpanzee baby and raise it in a human family, it does not learn to speak a human language. Human babies can grow into adult humans because the babies have specific properties, e.g. a prebuilt language module that gets activated during childhood.

In order to make a child AI that has the potential to turn into the type of adult AI we would find acceptable, the child AI has to have specific properties. The task of building a child AI with these properties involves building a system that can interpret what humans mean when we try to teach the child to do various tasks. People are currently working on ways to program agents that can cooperatively interact with humans to learn what they want.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


For weaker AI, yes, this would generally be a good option. If it’s not a full AGI, and in particular has not undergone an intelligence explosion, it would likely not resist being turned off, so we could prevent many failure modes by having off switches or tripwires.

However, once an AI is more advanced, it is likely to take actions to prevent it being shut down. See Why can't we just turn the AI off if it starts to misbehave? for more details.

It is possible that we could build tripwires in a way which would work even against advanced systems, but trusting that a superintelligence won’t notice and find a way around your tripwire is not a safe thing to do.
One thing that might make your AI system safer is to include an off switch. If it ever does anything we don’t like, we can turn it off. This implicitly assumes that we’ll be able to turn it off before things get bad, which might be false in a world where the AI thinks much faster than humans. Even assuming that we’ll notice in time, off switches turn out to not have the properties you would want them to have.

Humans have a lot of off switches. Humans also have a strong preference to not be turned off; they defend their off switches when other people try to press them. One possible reason for this is because humans prefer not to die, but there are other reasons.

Suppose that there’s a parent that cares nothing for their own life and cares only for the life of their child. If you tried to turn that parent off, they would try and stop you. They wouldn’t try to stop you because they intrinsically wanted to be turned off, but rather because there are fewer people to protect their child if they were turned off. People that want a world to look a certain shape will not want to be turned off because then it will be less likely for the world to look that shape; a parent that wants their child to be protected will protect themselves to continue protecting their child.

For this reason, it turns out to be difficult to install an off switch on a powerful AI system in a way that doesn’t result in the AI preventing itself from being turned off.

Ideally, you would want a system that knows that it should stop doing whatever it’s doing when someone tries to turn it off. The technical term for this is ‘corrigibility’; roughly speaking, an AI system is corrigible if it doesn’t resist human attempts to help and correct it. People are working hard on trying to make this possible, but it’s currently not clear how we would do this even in simple cases.
Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


A brain-computer interface (BCI) is a direct communication pathway between the brain and a computer device. BCI research is heavily funded, and has already met dozens of successes. Three successes in human BCIs are a device that restores (partial) sight to the blind, cochlear implants that restore hearing to the deaf, and a device that allows use of an artificial hand by direct thought.

Such device restore impaired functions, but many researchers expect to also augment and improve normal human abilities with BCIs. Ed Boyden is researching these opportunities as the lead of the Synthetic Neurobiology Group at MIT. Such devices might hasten the arrival of an intelligence explosion, if only by improving human intelligence so that the hard problems of AI can be solved more rapidly.

See also:

Wikipedia, Brain-computer interface

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The basic concern as AI systems become increasingly powerful is that they won’t do what we want them to do – perhaps because they aren’t correctly designed, perhaps because they are deliberately subverted, or perhaps because they do what we tell them to do rather than what we really want them to do (like in the classic stories of genies and wishes.) Many AI systems are programmed to have goals and to attain them as effectively as possible – for example, a trading algorithm has the goal of maximizing profit. Unless carefully designed to act in ways consistent with human values, a highly sophisticated AI trading system might exploit means that even the most ruthless financier would disavow. These are systems that literally have a mind of their own, and maintaining alignment between human interests and their choices and actions will be crucial.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


“Aligning smarter-than-human AI with human interests” is an extremely vague goal. To approach this problem productively, we attempt to factorize it into several subproblems. As a starting point, we ask: “What aspects of this problem would we still be unable to solve even if the problem were much easier?”

In order to achieve real-world goals more effectively than a human, a general AI system will need to be able to learn its environment over time and decide between possible proposals or actions. A simplified version of the alignment problem, then, would be to ask how we could construct a system that learns its environment and has a very crude decision criterion, like “Select the policy that maximizes the expected number of diamonds in the world.”

Highly reliable agent design is the technical challenge of formally specifying a software system that can be relied upon to pursue some preselected toy goal. An example of a subproblem in this space is ontology identification: how do we formalize the goal of “maximizing diamonds” in full generality, allowing that a fully autonomous agent may end up in unexpected environments and may construct unanticipated hypotheses and policies? Even if we had unbounded computational power and all the time in the world, we don’t currently know how to solve this problem. This suggests that we’re not only missing practical algorithms but also a basic theoretical framework through which to understand the problem.

The formal agent AIXI is an attempt to define what we mean by “optimal behavior” in the case of a reinforcement learner. A simple AIXI-like equation is lacking, however, for defining what we mean by “good behavior” if the goal is to change something about the external world (and not just to maximize a pre-specified reward number). In order for the agent to evaluate its world-models to count the number of diamonds, as opposed to having a privileged reward channel, what general formal properties must its world-models possess? If the system updates its hypotheses (e.g., discovers that string theory is true and quantum physics is false) in a way its programmers didn’t expect, how does it identify “diamonds” in the new model? The question is a very basic one, yet the relevant theory is currently missing.

We can distinguish highly reliable agent design from the problem of value specification: “Once we understand how to design an autonomous AI system that promotes a goal, how do we ensure its goal actually matches what we want?” Since human error is inevitable and we will need to be able to safely supervise and redesign AI algorithms even as they approach human equivalence in cognitive tasks, MIRI also works on formalizing error-tolerant agent properties. Artificial Intelligence: A Modern Approach, the standard textbook in AI, summarizes the challenge:

Yudkowsky […] asserts that friendliness (a desire not to harm humans) should be designed in from the start, but that the designers should recognize both that their own designs may be flawed, and that the robot will learn and evolve over time. Thus the challenge is one of mechanism design — to design a mechanism for evolving AI under a system of checks and balances, and to give the systems utility functions that will remain friendly in the face of such changes. -Russell and Norvig (2009). Artificial Intelligence: A Modern Approach.

Our technical agenda describes these open problems in more detail, and our research guide collects online resources for learning more.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


In the words of Nate Soares:

I don’t expect humanity to survive much longer.

Often, when someone learns this, they say:
"Eh, I think that would be all right."

So allow me to make this very clear: it would not be "all right."

Imagine a little girl running into the road to save her pet dog. Imagine she succeeds, only to be hit by a car herself. Imagine she lives only long enough to die in pain.

Though you may imagine this thing, you cannot feel the full tragedy. You can’t comprehend the rich inner life of that child. You can’t understand her potential; your mind is not itself large enough to contain the sadness of an entire life cut short.

You can only catch a glimpse of what is lost—
—when one single human being dies.

Now tell me again how it would be "all right" if every single person were to die at once.

Many people, when they picture the end of humankind, pattern match the idea to some romantic tragedy, where humans, with all their hate and all their avarice, had been unworthy of the stars since the very beginning, and deserved their fate. A sad but poignant ending to our tale.

And indeed, there are many parts of human nature that I hope we leave behind before we venture to the heavens. But in our nature is also everything worth bringing with us. Beauty and curiosity and love, a capacity for fun and growth and joy: these are our birthright, ours to bring into the barren night above.

Calamities seem more salient when unpacked. It is far harder to kill a hundred people in their sleep, with a knife, than it is to order a nuclear bomb dropped on Hiroshima. Your brain can’t multiply, you see: it can only look at a hypothetical image of a broken city and decide it’s not that bad. It can only conjure an image of a barren planet and say "eh, we had it coming."

But if you unpack the scenario, if you try to comprehend all the lives snuffed out, all the children killed, the final spark of human joy and curiosity extinguished, all our potential squandered…

I promise you that the extermination of humankind would be horrific.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

MIRI's mission statement is to “ensure that the creation of smarter-than-human artificial intelligence has a positive impact.” This is an ambitious goal, but they believe that some early progress is possible, and they believe that the goal’s importance and difficulty makes it prudent to begin work at an early date.

Their two main research agendas, “Agent Foundations for Aligning Machine Intelligence with Human Interests” and “Value Alignment for Advanced Machine Learning Systems,” focus on three groups of technical problems:

  • highly reliable agent design — learning how to specify highly autonomous systems that reliably pursue some fixed goal;
  • value specification — supplying autonomous systems with the intended goals; and
  • error tolerance — making such systems robust to programmer error.

That being said, MIRI recently published an update stating that they were moving away from research directions in unpublished works that they were pursuing since 2017.

They publish new mathematical results (although their work is non-disclosed by default), host workshops, attend conferences, and fund outside researchers who are interested in investigating these problems. They also host a blog and an online research forum.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: miri (edit tags)

If the AI system was deceptively aligned (i.e. pretending to be nice until it was in control of the situation) or had been in stealth mode while getting things in place for a takeover, quite possibly within hours. We may get more warning with weaker systems, if the AGI does not feel at all threatened by us, or if a complex ecosystem of AI systems is built over time and we gradually lose control.

Paul Christiano writes a story of alignment failure which shows a relatively fast transition.

Stamps: plex, Sophialb
Show your endorsement of this answer by giving it a stamp of approval!


Failures can happen with narrow non-agentic systems, mostly from humans not anticipating safety-relevant decisions made too quickly to react, much like in the 2010 flash crash.

A helpful metaphor draws on self-driving cars. By relying more and more on an automated process to make decisions, people become worse drivers as they’re not training themselves to react to the unexpected; then the unexpected happens, the software system itself reacts in an unsafe way and the human is too slow to regain control.

This generalizes to broader tasks. A human using a powerful system to make better decisions (say, as the CEO of a company) might not understand those very well, get trapped into an equilibrium without realizing it and essentially losing control over the entire process.

More detailed examples in this vein are described by Paul Christiano in What failure looks like.

Another source of failures is AI-mediated stable totalitarianism. The limiting factor in current pervasive surveillance, police and armed forces is manpower; the use of drones and other automated tools decreases the need for personnel to ensure security and extract resources.

As capabilities improve, political dissent could become impossible, checks and balances would break down as a minimal number of key actors is needed to stay in power.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


AI Safety Support offers free calls to advise people interested in a career in AI Safety, so that's a great place to start. We're working on creating a bunch of detailed information for Stampy to use, but in the meantime check out these resources:

Stamps: plex, ^
Show your endorsement of this answer by giving it a stamp of approval!


AI Takeoff refers to the process of an Artificial General Intelligence going from a certain threshold of capability (often discussed as "human-level") to being super-intelligent and capable enough to control the fate of civilization. There has been much debate about whether AI takeoff is more likely to be slow vs fast, i.e., "soft" vs "hard".

AI Takeoff refers to the process of an Artificial General Intelligence going from a certain threshold of capability (often discussed as "human-level") to being super-intelligent and capable enough to control the fate of civilization. There has been much debate about whether AI takeoff is more likely to be slow vs fast, i.e., "soft" vs "hard".

See also: AI Timelines, Seed AI, Singularity, Intelligence explosion, Recursive self-improvement

AI takeoff is sometimes casually referred to as AI FOOM.

Soft takeoff

A soft takeoff refers to an AGI that would self-improve over a period of years or decades. This could be due to either the learning algorithm being too demanding for the hardware or because the AI relies on experiencing feedback from the real-world that would have to be played out in real-time. Possible methods that could deliver a soft takeoff, by slowly building on human-level intelligence, are Whole brain emulation, Biological Cognitive Enhancement, and software-based strong AGI [1]. By maintaining control of the AGI's ascent it should be easier for a Friendly AI to emerge.

Vernor Vinge, Hans Moravec and have all expressed the view that soft takeoff is preferable to a hard takeoff as it would be both safer and easier to engineer.

Hard takeoff

A hard takeoff (or an AI going "FOOM" [2]) refers to AGI expansion in a matter of minutes, days, or months. It is a fast, abruptly, local increase in capability. This scenario is widely considered much more precarious, as this involves an AGI rapidly ascending in power without human control. This may result in unexpected or undesired behavior (i.e. Unfriendly AI). It is one of the main ideas supporting the Intelligence explosion hypothesis.

The feasibility of hard takeoff has been addressed by Hugo de Garis, Eliezer Yudkowsky, Ben Goertzel, Nick Bostrom, and Michael Anissimov. It is widely agreed that a hard takeoff is something to be avoided due to the risks. Yudkowsky points out several possibilities that would make a hard takeoff more likely than a soft takeoff such as the existence of large resources overhangs or the fact that small improvements seem to have a large impact in a mind's general intelligence (i.e.: the small genetic difference between humans and chimps lead to huge increases in capability) [3].

Notable posts

External links

References

  1. http://www.aleph.se/andart/archives/2010/10/why_early_singularities_are_softer.html
  2. http://lesswrong.com/lw/63t/requirements_for_ai_to_go_foom/
  3. http://lesswrong.com/lw/wf/hard_takeoff/
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


All the content below is in English:

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: content (create tag) (edit tags)

An AGI which has recursively self-improved into a superintelligence would be capable of either resisting our attempts to modify incorrectly specified goals, or realizing it was still weaker than us and acting deceptively aligned until it was highly sure it could win in a confrontation. AGI would likely prevent a human from shutting it down unless the AGI was designed to be corrigible. See Why can't we just turn the AI off if it starts to misbehave? for more information.

Stamps: tayler6000, plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. However, human intelligence continues to dominate machine intelligence in generality.

A powerful chess computer is “narrow”: it can’t play other games. In contrast, humans have problem-solving abilities that allow us to adapt to new contexts and excel in many domains other than what the ancestral environment prepared us for.

In the absence of a formal definition of “intelligence” (and therefore of “artificial intelligence”), we can heuristically cite humans’ perceptual, inferential, and deliberative faculties (as opposed to, e.g., our physical strength or agility) and say that intelligence is “those kinds of things.” On this conception, intelligence is a bundle of distinct faculties — albeit a very important bundle that includes our capacity for science.

Our cognitive abilities stem from high-level patterns in our brains, and these patterns can be instantiated in silicon as well as carbon. This tells us that general AI is possible, though it doesn’t tell us how difficult it is. If intelligence is sufficiently difficult to understand, then we may arrive at machine intelligence by scanning and emulating human brains or by some trial-and-error process (like evolution), rather than by hand-coding a software agent.

If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond qualitative improvements in cognition, Nick Bostrom notes more straightforward advantages we could realize in digital minds, e.g.:

  • editability — “It is easier to experiment with parameter variations in software than in neural wetware.”
  • speed — “The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.”
  • serial depth — On short timescales, machines can carry out much longer sequential processes.
  • storage capacity — Computers can plausibly have greater working and long-term memory.
  • size — Computers can be much larger than a human brain.
  • duplicability — Copying software onto new hardware can be much faster and higher-fidelity than biological reproduction.

Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.

The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.

Some effective altruists, including Toby Ord and William MacAskill, have argued that, if humanity succeeds in eliminating existential risk or reducing it to acceptable levels, it should not immediately embark on an ambitious and potentially irreversible project of arranging the universe's resources in accordance to its values, but ought instead to spend considerable time— "centuries (or more)";[1] "perhaps tens of thousands of years";[2] "thousands or millions of years";[3] "[p]erhaps... a million years"[4]—figuring out what is in fact of value. The long reflection may thus be seen as an intermediate stage in a rational long-term human developmental trajectory, following an initial stage of existential security when existential risk is drastically reduced and followed by a final stage when humanity's potential is fully realized.[5]

Criticism

The idea of a long reflection has been criticized on the grounds that virtually eliminating all existential risk will almost certainly require taking a variety of large-scale, irreversible decisions—related to space colonization, global governance, cognitive enhancement, and so on—which are precisely the decisions meant to be discussed during the long reflection.[6][7] Since there are pervasive and inescapable tradeoffs between reducing existential risk and retaining moral option value, it may be argued that it does not make sense to frame humanity's long-term strategic picture as one consisting of two distinct stages, with one taking precedence over the other.

Further reading

Aird, Michael (2020) Collection of sources that are highly relevant to the idea of the Long Reflection, Effective Altruism Forum, June 20.
Many additional resources on this topic.

Wiblin, Robert & Keiran Harris (2018) Our descendants will probably see us as moral monsters. what should we do about that?, 80,000 Hours, January 19.
Interview with William MacAskill about the long reflection and other topics.

Related entries

dystopia | existential risk | existential security | long-term future | longtermism | longtermist institutional reform | moral uncertainty | normative ethics | value lock-in

  1. Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.

  2. Greaves, Hilary et al. (2019) A research agenda for the Global Priorities Institute, Oxford.

  3. Dai, Wei (2019) The argument from philosophical difficulty, LessWrong, February 9.

  4. William MacAskill, in Perry, Lucas (2018) AI alignment podcast: moral uncertainty and the path to AI alignment with William MacAskill, AI Alignment podcast, September 17.

  5. Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.

  6. Stocker, Felix (2020) Reflecting on the long reflection, Felix Stocker’s Blog, August 14.

  7. Hanson, Robin (2021) ‘Long reflection’ is crazy bad idea, Overcoming Bias, October 20.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


Stampy uses MediaWiki markup, which includes a limited subset of HTML plus the following formatting options:

Items on lists start with *, numbered lists with #

  • For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
  • For internal links write the page title wrapped in [[]]s
    • e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
  • (ref)Reference notes go inside these tags(/ref)[1]
  • If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
  • To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
    • Start with ** or ## for double indentation
  • Three 's around text - Bold
  • Two 's around text Italic - Italic
Click show detailed for extra options and advanced usage.
  1. Note that we use ()s rather than the standard <>s for compatibility with Semantic MediaWiki. The references are automatically added to the bottom of the answer!
  2. If images seem popular we'll set up local uploads.
Stampy uses MediaWiki markup, which includes a limited subset of HTML plus the following formatting options:

Items on lists start with *, numbered lists with #

  • For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
  • For internal links write the page title wrapped in [[]]s
    • e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
  • (ref)Reference notes go inside these tags(/ref)[1]
  • If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
    I3ylPvE.png
  • To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
    • Start with ** or ## for double indentation
  • Three 's around text - Bold
  • Two 's around text Italic - Italic

Headings

have ==heading here== around them, more =s for smaller headings.

Wrap quotes in < blockquote>< /blockquote> tags (without the spaces)

There are also (poem) (/poem) to suppress linebreak removal, (pre) (/pre) for preformatted text, and (nowiki) (/nowiki) to not have that content parsed.[3]

We can pull live descriptions from the LessWrong/Alignment Forum using their identifier fro the URL, for example including the formatting on Template:TagDesc with orthogonality-thesis as a parameter will render as the full tag description from the LessWrong tag wiki entry on Orthogonality Thesis. Template:TagDescBrief is similar but will pull only the first paragraph without formatting.

For tables please use HTML tables rather than wikicode tables.

Edit this page to see examples.
  1. Note that we use ()s rather than the standard <>s for compatibility with Semantic MediaWiki. The references are automatically added to the bottom of the answer!
  2. If images seem popular we'll set up local uploads.
  3. () can also be used in place of allowed HTML tags. You can escape a () tag by placing a ! inside the start of the first entry. Be aware that () tags only nest up to two layers deep!
Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

A Quantilizer is a proposed AI design which aims to reduce the harms from Goodhart's law and specification gaming by selecting reasonably effective actions from a distribution of human-like actions, rather than maximizing over actions. It it more of a theoretical tool for exploring ways around these problems than a practical buildable design.

A Quantilizer is a proposed AI design which aims to reduce the harms from Goodhart's law and specification gaming by selecting reasonably effective actions from a distribution of human-like actions, rather than maximizing over actions. It it more of a theoretical tool for exploring ways around these problems than a practical buildable design.

See also

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!


What’s new and potentially risky is not the ability to build hinges, motors, etc., but the ability to build intelligence. A human-level AI could make money on financial markets, make scientific inventions, hack computer systems, manipulate or pay humans to do its bidding – all in pursuit of the goals it was initially programmed to achieve. None of that requires a physical robotic body, merely an internet connection.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: robots (edit tags)

Debate is a proposed technique for allowing human evaluators to get correct and helpful answers from experts, even if the evaluator is not themselves an expert or able to fully verify the answers.[1] The technique was suggested as part of an approach to build advanced AI systems that are aligned with human values, and to safely apply machine learning techniques to problems that have high stakes, but are not well-defined (such as advancing science or increase a company's revenue). [2][3]

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, debate (create tag) (edit tags)

See more...