Semantic search

From Stampy's Wiki

It is impossible to design an AI without a goal, because it would do nothing. Therefore, in the sense that designing the AI’s goal is a form of control, it is impossible not to control an AI. This goes for anything that you create. You have to control the design of something at least somewhat in order to create it.

There may be relevant moral questions about our future relationship with possibly sentient machine intelligent, but the priority of the Control Problem finding a way to ensure the survival and well-being of the human species.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Goal-directed behavior arises naturally when systems are trained to on an objective. AI not trained or programmed to do well by some objective function would not be good at anything, and would be useless.

See Eliezer's and Gwern's posts about tool AI.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: tool ai (edit tags)

Cybersecurity is important because computing systems comprise the backbone of the modern economy. If the security of the internet was compromised, then the economy would suffer a tremendous blow.

Similarly, AI Safety might become important as AI systems begin forming larger and larger parts of the modern economy. As more and more labor gets automated, it becomes more and more important to ensure that that labor is occurring in a safe and robust way.

Before the widespread adoption of computing systems, lack of Cybersecurity didn’t cause much damage. However, it might have been beneficial to start thinking about Cybersecurity problems before the solutions were necessary.

Similarly, since AI systems haven’t been adopted en mass yet, lack of AI Safety isn’t causing harm. However, given that AI systems will become increasingly powerful and increasingly widespread, it might be prudent to try to solve safety problems before a catastrophe occurs.

Additionally, people sometimes think about Artificial General Intelligence (AGI), sometimes called Human-Level Artificial Intelligence (HLAI). One of the core problems in AI Safety is ensuring when AGI gets built, it has human interests at heart. (Note that most surveyed experts think building GI/HLAI is possible, but there is wide disagreement on how soon this might occur).

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

To help frame this question, we’re going to first answer the dual question of “what is Cybersecurity?”

As a concept, Cybersecurity is the idea that questions like “is this secure?” can meaningfully be asked of computing systems, where “secure” roughly means “is difficult for unauthorized individuals to get access to”. As a problem, Cybersecurity is the set of problems one runs into when trying to design and build secure computing systems. As a field, Cybersecurity is a group of people trying to solve the aforementioned set of problems in robust ways.

As a concept, AI Safety is the idea that questions like “is this safe?” can meaningfully be asked of AI Systems, where “safe” roughly means “does what it’s supposed to do”. As a problem, AI Safety is the set of problems one runs into when trying to design and build AI systems that do what they’re supposed to do. As a field, AI Safety is a group of people trying to solve the aforementioned set of problems in robust ways.

The reason we have a separate field of Cybersecurity is because ensuring the security of the internet and other critical systems is both hard and important. We might want a separate field of AI Safety for similar reasons; we might expect getting powerful AI systems to do what we want to be both hard and important.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

AGI means an AI that is 'general', so it is intelligent in many different domains.

Superintelligence just means doing something better than a human. For example Stockfish or Deep Blue are narrowly superintelligent in playing chess.

TAI (transformative AI) doesn't have to be general. It means 'a system that changes the world in a significant way'. It's used to emphasize, that even non-general systems can have extreme world-changing consequences.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


In addition to the usual continuation of Moore's Law, GPUs have become more powerful and cheaper in the past decade, especially since around 2016. Many ideas in AI have been thought about for a long time, but the speed at which modern processors can do computing and parallel processing allows researchers to implement their ideas and gather more observational data. Improvements in AI have allowed many industries to start using the technologies, which creates demand and brings more focus on AI research (as well as improving the availability of technology on the whole due to more efficient infrastructure). Data has also become more abundant and available, and not only is data a bottleneck for machine learning algorithms, but the abundance of data is difficult for humans to deal with alone, so businesses often turn to AI to convert it to something human-parsable. These processes are also recursive, to some degree, so the more AI improves, the more can be done to improve AI.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Very hard to say. This draft report for the Open Philanthropy Project is perhaps the most careful attempt so far (and generates these graphs), but there have also been expert surveys, and many people have shared various thoughts. Berkeley AI professor Stuart Russell has given his best guess as “sometime in our children’s lifetimes”, and Ray Kurzweil (Google’s director of engineering) predicts human level AI by 2029 and the singularity by 2045. The Metaculus question on publicly known AGI has a median of around 2029 (around 10 years sooner than it was before the GPT-3 AI showed unexpected ability on a broad range of tasks).

The consensus answer is something like: “highly uncertain, maybe not for over a hundred years, maybe in less than 15, with around the middle of the century looking fairly plausible”.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


The term Intelligent agent is often used in AI as a synonym of Software agent, however the original meaning comes from economics where it was used to describe human actors and other legal entities.

So if we want to include agentless optimizing processes (like evolution) and AIs implemented as distributed systems in some technical discussion, it can be useful to use the terms "agenty" or "agentlike" to avoid addressing the philosophical questions of agency.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

Ideally, it would be aligned to everyone's shared values. This is captured in the "coherent extrapolated volition" idea, which is meant to be the holy grail of alignment. The problem is that it's extremely hard to implement it.

We could divide the alignment problem into two subproblems: aligning AI to its creators, and aligning those creators to the general population. Let's assume optimistically that the first one is solved. Now, we still can have a situation where the creators want something that's harmful for the rest, for example when they are a for-profit company whose objective is to maximize those profits, regardless of the externalities.

One approach is to crowdsource the values for AI, like in the moral machineexample, where people are faced with a moral dilemma and have to choose which action to choose. This data could then used to train the AI. One problem with such approach is that people are prone to lots of cognitive biases, and their answers won't be fully rational. The AI would then align to what people say they value, and not to what they actually value, which with a superintelligent system may be catastrophic. The AI should be aware of this fact and don't take what people say at face value, but try to infer their underlying values. This is an active area of study.

For some, the problem of aligning the AI creators with the rest of the people, is just as hard or even harder, than aligning those creators with AI. The solution could require passing some law or building some decentralized system.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Even if we only build lots of narrow AIs, we might end up with a distributed system that acts like an AGI - the algorithm does not have to be encoded in a single entity, the definition in What is Artificial General Intelligence and what will it look like? applies to distributed implementations too.

This is similar to a group of people in a corporation can achieve projects that humans could not individually (like going to space), but the analogy of corporations and AGI is not perfect - see Why Not Just: Think of AGI Like a Corporation?.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: agi, narrow ai (edit tags)

It depends on the exact definition of consciousness and on the legal consequences of the AI telling us that stuff from which we could imply how conscious it might be (would it be motivated to pretend to be "conscious" by those criteria to get some benefits,, or would it be motivated to keep its consciousness in secret to avoid being turned off).

Once we have a measurable definition, then we can empirically measure the AI against that definition.

See integrated information theory for practical approaches, though there is always the hard problem of consciousness that will muddy any candidate definitions for near future.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Yes. While creativity has many meanings and AIs can be obviously creative in the wide sense of the word (make new valuable artifacts like a real-time translation of a restaurant menu, compiling source code into binary files, ...), there is also no reason to believe that AIs couldn't be considered creative in a more narrow sense too (making art like music or paintings, writing computer programs based on conversation with a customer).

There is a notion of being "really creative" that can be defined in a circular way that only humans can be really creative, but if we avoid moving the goal post, then it should be possible to make a variation of a Turing test to measure the AI vs human creativity and answer that question empirically for any particular AI.

AlphaGo made a move widely considered creative in its game against a top human Go player, which has been widely discussed.

Stamps: plex
Show your endorsement of this answer by giving it a stamp of approval!

Tags: creativity (create tag) (edit tags)

It's true that AGI may be really many years ahead. But what worries a lot of people, is that it may be much harder to make powerful AND safe AI, than just a powerful AI, and then, the first powerful AIs we create will be dangerous.

If that's the case, the sooner we start working on AI safety, the smaller the chances of humans going extinct, or ending up in some Black Mirror episode.

Also Rob Miles talks about this concern in this video.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: timelines, agi (edit tags)

No, but it helps. Some great resources if you're considering it, are: https://rohinshah.com/faq-career-advice-for-ai-alignment-researchers/ https://80000hours.org/articles/ai-safety-syllabus/ https://80000hours.org/career-reviews/machine-learning-phd/

The first two links show general ways to get into AI safety, and the last will show you the upsides and downsides of choosing to make a PhD.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: careers (edit tags)

Primarily, they are trying to make a competent AI, and any consciousness that arises will probably be by accident.

There are even some people saying we should try to make the AI unconscious, to minimize the risk of it suffering.

The biggest problem here, is that we don't have any good way of telling if some system is conscious. The best theory we have, the Integrated Information Theory, has some deep philosophical and practical problems and there are many controversies around it.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


We don't have AI systems that are generally more capable than humans. So there is still time left to figure out how to build systems that are smarter than humans in a safe way.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Eliezer Yudkowsky has proposed Coherent Extrapolated Volition as a solution to at least two problems facing Friendly AI design:

  1. The fragility of human values: Yudkowsky writes that “any future not shaped by a goal system with detailed reliable inheritance from human morals and metamorals will contain almost nothing of worth.” The problem is that what humans value is complex and subtle, and difficult to specify. Consider the seemingly minor value of novelty. If a human-like value of novelty is not programmed into a superintelligent machine, it might explore the universe for valuable things up to a certain point, and then maximize the most valuable thing it finds (the exploration-exploitation tradeoff[58]) — tiling the solar system with brains in vats wired into happiness machines, for example. When a superintelligence is in charge, you have to get its motivational system exactly right in order to not make the future undesirable.
  2. The locality of human values: Imagine if the Friendly AI problem had faced the ancient Greeks, and they had programmed it with the most progressive moral values of their time. That would have led the world to a rather horrifying fate. But why should we think that humans have, in the 21st century, arrived at the apex of human morality? We can’t risk programming a superintelligent machine with the moral values we happen to hold today. But then, which moral values do we give it?

Yudkowsky suggests that we build a ‘seed AI’ to discover and then extrapolate the ‘coherent extrapolated volition’ of humanity:

> In poetic terms, our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

The seed AI would use the results of this examination and extrapolation of human values to program the motivational system of the superintelligence that would determine the fate of the galaxy.

However, some worry that the collective will of humanity won’t converge on a coherent set of goals. Others believe that guaranteed Friendliness is not possible, even by such elaborate and careful means.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Many AI designs that would generate an intelligence explosion would not have a ‘slot’ in which a goal (such as ‘be friendly to human interests’) could be placed. For example, if AI is made via whole brain emulation, or evolutionary algorithms, or neural nets, or reinforcement learning, the AI will end up with some goal as it self-improves, but that stable eventual goal may be very difficult to predict in advance.

Thus, in order to design a friendly AI, it is not sufficient to determine what ‘friendliness’ is (and to specify it clearly enough that even a superintelligence will interpret it the way we want it to). We must also figure out how to build a general intelligence that satisfies a goal at all, and that stably retains that goal as it edits its own code to make itself smarter. This task is perhaps the primary difficulty in designing friendly AI.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: friendly ai (create tag) (edit tags)

Some have proposed[49][50][51][52] that we teach machines a moral code with case-based machine learning. The basic idea is this: Human judges would rate thousands of actions, character traits, desires, laws, or institutions as having varying degrees of moral acceptability. The machine would then find the connections between these cases and learn the principles behind morality, such that it could apply those principles to determine the morality of new cases not encountered during its training. This kind of machine learning has already been used to design machines that can, for example, detect underwater mines[53] after feeding the machine hundreds of cases of mines and not-mines.

There are several reasons machine learning does not present an easy solution for Friendly AI. The first is that, of course, humans themselves hold deep disagreements about what is moral and immoral. But even if humans could be made to agree on all the training cases, at least two problems remain.

The first problem is that training on cases from our present reality may not result in a machine that will make correct ethical decisions in a world radically reshaped by superintelligence.

The second problem is that a superintelligence may generalize the wrong principles due to coincidental patterns in the training data.[54] Consider the parable of the machine trained to recognize camouflaged tanks in a forest. Researchers take 100 photos of camouflaged tanks and 100 photos of trees. They then train the machine on 50 photos of each, so that it learns to distinguish camouflaged tanks from trees. As a test, they show the machine the remaining 50 photos of each, and it classifies each one correctly. Success! However, later tests show that the machine classifies additional photos of camouflaged tanks and trees poorly. The problem turns out to be that the researchers’ photos of camouflaged tanks had been taken on cloudy days, while their photos of trees had been taken on sunny days. The machine had learned to distinguish cloudy days from sunny days, not camouflaged tanks from trees.

Thus, it seems that trustworthy Friendly AI design must involve detailed models of the underlying processes generating human moral judgments, not only surface similarities of cases.

See also:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


A Friendly Artificial Intelligence (Friendly AI or FAI) is an artificial intelligence that is ‘friendly’ to humanity — one that has a good rather than bad effect on humanity.

AI researchers continue to make progress with machines that make their own decisions, and there is a growing awareness that we need to design machines to act safely and ethically. This research program goes by many names: ‘machine ethics’, ‘machine morality’, ‘artificial morality’, ‘computational ethics’ and ‘computational metaethics’, ‘friendly AI’, and ‘robo-ethics’ or ‘robot ethics’.

The most immediate concern may be in battlefield robots; the U.S. Department of Defense contracted Ronald Arkin to design a system for ensuring ethical behavior in autonomous battlefield robots. The U.S. Congress has declared that a third of America’s ground systems must be robotic by 2025, and by 2030 the U.S. Air Force plans to have swarms of bird-sized flying robots that operate semi-autonomously for weeks at a time.

But Friendly AI research is not concerned with battlefield robots or machine ethics in general. It is concerned with a problem of a much larger scale: designing AI that would remain safe and friendly after the intelligence explosion.

A machine superintelligence would be enormously powerful. Successful implementation of Friendly AI could mean the difference between a solar system of unprecedented happiness and a solar system in which all available matter has been converted into parts for achieving the superintelligence’s goals.

It must be noted that Friendly AI is a harder project than often supposed. As explored below, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence:

  1. Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.
  2. Literalness: a superintelligent machine will make decisions based on the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety[41][42][43] of what humans value. A demand like “maximize human happiness” sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.

See also:

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: definitions, friendly ai (create tag) (edit tags)

Let’s consider the likely consequences of some utilitarian designs for Friendly AI.

An AI designed to minimize human suffering might simply kill all humans: no humans, no human suffering.[44][45]

Or, consider an AI designed to maximize human pleasure. Rather than build an ambitious utopia that caters to the complex and demanding wants of humanity for billions of years, it could achieve its goal more efficiently by wiring humans into Nozick’s experience machines. Or, it could rewire the ‘liking’ component of the brain’s reward system so that whichever hedonic hotspot paints sensations with a ‘pleasure gloss’[46][47] is wired to maximize pleasure when humans sit in jars. That would be an easier world for the AI to build than one that caters to the complex and nuanced set of world states currently painted with the pleasure gloss by most human brains.

Likewise, an AI motivated to maximize objective desire satisfaction or reported subjective well-being could rewire human neurology so that both ends are realized whenever humans sit in jars. Or it could kill all humans (and animals) and replace them with beings made from scratch to attain objective desire satisfaction or subjective well-being when sitting in jars. Either option might be easier for the AI to achieve than maintaining a utopian society catering to the complexity of human (and animal) desires. Similar problems afflict other utilitarian AI designs.

It’s not just a problem of specifying goals, either. It is hard to predict how goals will change in a self-modifying agent. No current mathematical decision theory can process the decisions of a self-modifying agent.

So, while it may be possible to design a superintelligence that would do what we want, it’s harder than one might initially think.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!