- Review answers - Answers in need of attention. Stamp things on that page to make them show up higher here.
- Imported FAQs - Questions and answers imported from external FAQs.
Questions and Answers
These are the answers with the most Stamps.
This is actually an active area of AI alignment research, called "Impact Measures"! It's not trivial to formalize in a way which won't predictably go wrong (entropy minimization likely leads to an AI which tries really hard to put out all the stars ASAP since they produce so much entropy, for example), but progress is being made. You can read about it on the Alignment Forum tag, or watch Rob's videos Avoiding Negative Side Effects and Avoiding Positive Side Effects
Once an AGI has access to the internet it would be very challenging to meaningfully restrict it from doing things online which it wants to. There are too many options to bypass blocks we may put in place.
It may be possible to design it so that it does not want to do dangerous things in the first place, or perhaps to set up tripwires so that we notice that it’s trying to do a dangerous thing, though that relies on it not noticing or bypassing the tripwire so should not be the only layer of security.
If the AI system was deceptively aligned (i.e. pretending to be nice until it was in control of the situation) or had been in stealth mode while getting things in place for a takeover, quite possibly within hours. We may get more warning with weaker systems, if the AGI does not feel at all threatened by us, or if a complex ecosystem of AI systems is built over time and we gradually lose control.
Paul Christiano writes a story of alignment failure which shows a relatively fast transition.
AI Safety Support offers free calls to advise people interested in a career in AI Safety, so that's a great place to start. We're working on creating a bunch of detailed information for Stampy to use, but in the meantime check out these resources:
- EA Cambridge AGI Safety Fundamentals curriculum
- 80,000 Hours AI safety syllabus
- Adam Gleave's Careers in Beneficial AI Research document
- Rohin Shah's FAQ on career advice for AI alignment researchers
- AI Safety Support has lots of other good resources, such as their links page, slack, newsletter, and events calendar.
- Safety-aligned research training programs (under construction).
On the pessimistic end you find people like Eliezer Yudkowsky, who said: "I consider the present gameboard to look incredibly grim, and I don't actually see a way out through hard work alone. We can hope there's a miracle that violates some aspect of my background model, and we can try to prepare for that unknown miracle; preparing for an unknown miracle probably looks like "Trying to die with more dignity on the mainline" (because if you can die with more dignity on the mainline, you are better positioned to take advantage of a miracle if it occurs)."
While at the optimistic end you have people like Ben Garfinkel who put the probability at more like 0.1-1% for AI causing an existential catastrophe in the next century, with most people lying somewhere in the middle.
In previous decades, AI research had proceeded more slowly than some experts predicted. According to experts in the field, however, this trend has reversed in the past 5 years or so. AI researchers have been repeatedly surprised by, for example, the effectiveness of new visual and speech recognition systems. AI systems can solve CAPTCHAs that were specifically devised to foil AIs, translate spoken text on-the-fly, and teach themselves how to play games they have neither seen before nor been programmed to play. Moreover, the real-world value of this effectiveness has prompted massive investment by large tech firms such as Google, Facebook, and IBM, creating a positive feedback cycle that could dramatically speed progress.
An aligned superintelligence will have a set of human values. As mentioned in What are human values? the set of values are complex, which means that the implementation of these values will decide whether the superintelligence cares about nonhuman animals. In AI Ethics and Value Alignment for Nonhuman Animals Soenke Ziesche argues that the alignment should include the values of nonhuman animals.
This largely depends on when you think AI will be advanced enough to constitute an immediate threat to humanity. This is difficult to estimate, but the field is surveyed at How long will it be until transformative AI is created?, which comes to the conclusion that it is relatively widely believed that AI will transform the world in our lifetimes.
We probably shouldn't rely too strongly on these opinions as predicting the future is hard. But, due to the enormous damage a misaligned AGI could do, it's worth putting a great deal of effort towards AI alignment even if you just care about currently existing humans (such as yourself).
The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.
Some effective altruists, including Toby Ord and William MacAskill, have argued that, if humanity succeeds in eliminating existential risk or reducing it to acceptable levels, it should not immediately embark on an ambitious and potentially irreversible project of arranging the universe's resources in accordance to its values, but ought instead to spend considerable time— "centuries (or more)"; "perhaps tens of thousands of years"; "thousands or millions of years"; "[p]erhaps... a million years"—figuring out what is in fact of value. The long reflection may thus be seen as an intermediate stage in a rational long-term human developmental trajectory, following an initial stage of existential security when existential risk is drastically reduced and followed by a final stage when humanity's potential is fully realized.
The idea of a long reflection has been criticized on the grounds that virtually eliminating all existential risk will almost certainly require taking a variety of large-scale, irreversible decisions—related to space colonization, global governance, cognitive enhancement, and so on—which are precisely the decisions meant to be discussed during the long reflection. Since there are pervasive and inescapable tradeoffs between reducing existential risk and retaining moral option value, it may be argued that it does not make sense to frame humanity's long-term strategic picture as one consisting of two distinct stages, with one taking precedence over the other.
Aird, Michael (2020) Collection of sources that are highly relevant to the idea of the Long Reflection, Effective Altruism Forum, June 20.
Many additional resources on this topic.
Wiblin, Robert & Keiran Harris (2018) Our descendants will probably see us as moral monsters. what should we do about that?, 80,000 Hours, January 19.
Interview with William MacAskill about the long reflection and other topics.
Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.
Greaves, Hilary et al. (2019) A research agenda for the Global Priorities Institute, Oxford.
Dai, Wei (2019) The argument from philosophical difficulty, LessWrong, February 9.
William MacAskill, in Perry, Lucas (2018) AI alignment podcast: moral uncertainty and the path to AI alignment with William MacAskill, AI Alignment podcast, September 17.
Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.
Stocker, Felix (2020) Reflecting on the long reflection, Felix Stocker’s Blog, August 14.
Hanson, Robin (2021) ‘Long reflection’ is crazy bad idea, Overcoming Bias, October 20.
Language models can be utilized to produce propaganda by acting like bots and interacting with users on social media. This can be done to push a political agenda or to make fringe views appear more popular than they are.
I'm envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusions) and the target audience, and the system will give you the most convincing arguments for it. At that point people won't be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.
-- Wei Dei, quoted in Persuasion Tools: AI takeover without AGI or agency?
As of 2022, this is not within the reach of current models. However, on the current trajectory, AI might be able to write articles and produce other media for propagandistic purposes that are superior to human-made ones in not too many years. These could be precisely tailored to individuals, using things like social media feeds and personal digital data.
Additionally, recommender systems on content platforms like YouTube, Twitter, and Facebook use machine learning, and the content they recommend can influence the opinions of billions of people. Some research has looked at the tendency for platforms to promote extremist political views and to thereby help radicalize their userbase for example.
In the long term, misaligned AI might use its persuasion abilities to gain influence and take control over the future. This could look like convincing its operators to let it out of a box, to give it resources or creating political chaos in order to disable mechanisms to prevent takeover as in this story.
See Risks from AI persuasion for a deep dive into the distinct risks from AI persuasion.
These are the answers to questions reviewed as high quality.
Let’s say that you’re the French government a while back. You notice that one of your colonies has too many rats, which is causing economic damage. You have basic knowledge of economics and incentives, so you decide to incentivize the local population to kill rats by offering to buy rat tails at one dollar apiece.
Initially, this works out and your rat problem goes down. But then, an enterprising colony member has the brilliant idea of making a rat farm. This person sells you hundreds of rat tails, costing you hundreds of dollars, but they’re not contributing to solving the rat problem.
Soon other people start making their own rat farms and you’re wasting thousands of dollars buying useless rat tails. You call off the project and stop paying for rat tails. This causes all the people with rat farms to shutdown their farms and release a bunch of rats. Now your colony has an even bigger rat problem.
Here’s another, more made-up example of the same thing happening. Let’s say you’re a basketball talent scout and you notice that height is correlated with basketball performance. You decide to find the tallest person in the world to recruit as a basketball player. Except the reason that they’re that tall is because they suffer from a degenerative bone disorder and can barely walk.
Another example: you’re the education system and you want to find out how smart students are so you can put them in different colleges and pay them different amounts of money when they get jobs. You make a test called the Standardized Admissions Test (SAT) and you administer it to all the students. In the beginning, this works. However, the students soon begin to learn that this test controls part of their future and other people learn that these students want to do better on the test. The gears of the economy ratchet forwards and the students start paying people to help them prepare for the test. Your test doesn’t stop working, but instead of measuring how smart the students are, it instead starts measuring a combination of how smart they are and how many resources they have to prepare for the test.
The formal name for the thing that’s happening is Goodhart’s Law. Goodhart’s Law roughly says that if there’s something in the world that you want, like “skill at basketball” or “absence of rats” or “intelligent students”, and you create a measure that tries to measure this like “height” or “rat tails” or “SAT scores”, then as long as the measure isn’t exactly the thing that you want, the best value of the measure isn’t the thing you want: the tallest person isn’t the best basketball player, the most rat tails isn’t the smallest rat problem, and the best SAT scores aren’t always the smartest students.
If you start looking, you can see this happening everywhere. Programmers being paid for lines of code write bloated code. If CFOs are paid for budget cuts, they slash purchases with positive returns. If teachers are evaluated by the grades they give, they hand out As indiscriminately.
In machine learning, this is called specification gaming, and it happens frequently.
Now that we know what Goodhart’s Law is, I’m going to talk about one of my friends, who I’m going to call Alice. Alice thinks it’s funny to answer questions in a way that’s technically correct but misleading. Sometimes I’ll ask her, “Hey Alice, do you want pizza or pasta?” and she responds, “yes”. Because, she sure did want either pizza or pasta. Other times I’ll ask her, “have you turned in your homework?” and she’ll say “yes” because she’s turned in homework at some point in the past; it’s technically correct to answer “yes”. Maybe you have a friend like Alice too.
Whenever this happens, I get a bit exasperated and say something like “you know what I mean”.
It’s one of the key realizations in AI Safety that AI systems are always like your friend that gives answers that are technically what you asked for but not what you wanted. Except, with your friend, you can say “you know what I mean” and they will know what you mean. With an AI system, it won’t know what you mean; you have to explain, which is incredibly difficult.
Let’s take the pizza pasta example. When I ask Alice “do you want pizza or pasta?”, she knows what pizza and pasta are because she’s been living her life as a human being embedded in an English speaking culture. Because of this cultural experience, she knows that when someone asks an “or” question, they mean “which do you prefer?”, not “do you want at least one of these things?”. Except my AI system is missing the thousand bits of cultural context needed to even understand what pizza is.
When you say “you know what I mean” to an AI system, it’s going to be like “no, I do not know what you mean at all”. It’s not even going to know that it doesn’t know what you mean. It’s just going to say “yes I know what you meant, that’s why I answered ‘yes’ to your question about whether I preferred pizza or pasta.” (It also might know what you mean, but just not care.)
If someone doesn’t know what you mean, then it’s really hard to get them to do what you want them to do. For example, let’s say you have a powerful grammar correcting system, which we’ll call Syntaxly+. Syntaxly+ doesn’t quite fix your grammar, it changes your writing so that the reader feels as good as possible after reading it.
Pretend it’s the end of the week at work and you haven’t been able to get everything done your boss wanted you to do. You write the following email:
"Hey boss, I couldn’t get everything done this week. I’m deeply sorry. I’ll be sure to finish it first thing next week."
You then remember you got Syntaxly+, which will make your email sound much better to your boss. You run it through and you get:
"Hey boss, Great news! I was able to complete everything you wanted me to do this week. Furthermore, I’m also almost done with next week’s work as well."
What went wrong here? Syntaxly+ is a powerful AI system that knows that emails about failing to complete work cause negative reactions in readers, so it changed your email to be about doing extra work instead.
This is smart - Syntaxly+ is good at making writing that causes positive reactions in readers. This is also stupid - the system changed the meaning of your email, which is not something you wanted it to do. One of the insights of AI Safety is that AI systems can be simultaneously smart in some ways and dumb in other ways.
The thing you want Syntaxly+ to do is to change the grammar/style of the email without changing the contents. Except what do you mean by contents? You know what you mean by contents because you are a human who grew up embedded in language, but your AI system doesn’t know what you mean by contents. The phrases “I failed to complete my work” and “I was unable to finish all my tasks” have roughly the same contents, even though they share almost no relevant words.
Roughly speaking, this is why AI Safety is a hard problem. Even basic tasks like “fix the grammar of this email” require a lot of understanding of what the user wants as the system scales in power.
In Human Compatible, Stuart Russell gives the example of a powerful AI personal assistant. You notice that you accidentally double-booked meetings with people, so you ask your personal assistant to fix it. Your personal assistant reports that it caused the car of one of your meeting participants to break down. Not what you wanted, but technically a solution to your problem.
You can also imagine a friend from a wildly different culture than you. Would you put them in charge of your dating life? Now imagine that they were much more powerful than you and desperately desired that your dating life to go well. Scary, huh.
In general, unless you’re careful, you’re going to have this horrible problem where you ask your AI system to do something and it does something that might technically be what you wanted but is stupid. You’re going to be like “wait that wasn’t what I mean”, except your system isn’t going to know what you meant.
Intelligence measures an agent’s ability to achieve goals in a wide range of environments.
This is a bit vague, but serves as the working definition of ‘intelligence’. For a more in-depth exploration, see Efficient Cross-Domain Optimization.
- Wikipedia, Intelligence
- Neisser et al., Intelligence: Knowns and Unknowns
- Wasserman & Zentall (eds.), Comparative Cognition: Experimental Explorations of Animal Intelligence
- Legg, Definitions of Intelligence
After reviewing extensive literature on the subject, Legg and Hutter summarizes the many possible valuable definitions in the informal statement “Intelligence measures an agent’s ability to achieve goals in a wide range of environments.” They then show this definition can be mathematically formalized given reasonable mathematical definitions of its terms. They use Solomonoff induction - a formalization of Occam's razor - to construct an universal artificial intelligence with a embedded utility function which assigns less utility to those actions based on theories with higher complexity. They argue this final formalization is a valid, meaningful, informative, general, unbiased, fundamental, objective, universal and practical definition of intelligence.
We can relate Legg and Hutter's definition with the concept of optimization. According to Eliezer Yudkowsky intelligence is efficient cross-domain optimization. It measures an agent's capacity for efficient cross-domain optimization of the world according to the agent’s preferences. Optimization measures not only the capacity to achieve the desired goal but also is inversely proportional to the amount of resources used. It’s the ability to steer the future so it hits that small target of desired outcomes in the large space of all possible outcomes, using fewer resources as possible. For example, when Deep Blue defeated Kasparov, it was able to hit that small possible outcome where it made the right order of moves given Kasparov’s moves from the very large set of all possible moves. In that domain, it was more optimal than Kasparov. However, Kasparov would have defeated Deep Blue in almost any other relevant domain, and hence, he is considered more intelligent.
One could cast this definition in a possible world vocabulary, intelligence is:
- the ability to precisely realize one of the members of a small set of possible future worlds that have a higher preference over the vast set of all other possible worlds with lower preference; while
- using fewer resources than the other alternatives paths for getting there; and in the
- most diverse domains as possible.
How many more worlds have a higher preference then the one realized by the agent, less intelligent he is. How many more worlds have a lower preference than the one realized by the agent, more intelligent he is. (Or: How much smaller is the set of worlds at least as preferable as the one realized, more intelligent the agent is). How much less paths for realizing the desired world using fewer resources than those spent by the agent, more intelligent he is. And finally, in how many more domains the agent can be more efficiently optimal, more intelligent he is. Restating it, the intelligence of an agent is directly proportional to:
- (a) the numbers of worlds with lower preference than the one realized,
- (b) how much smaller is the set of paths more efficient than the one taken by the agent and
- (c) how more wider are the domains where the agent can effectively realize his preferences;
and it is, accordingly, inversely proportional to:
- (d) the numbers of world with higher preference than the one realized,
- (e) how much bigger is the set of paths more efficient than the one taken by the agent and
- (f) how much more narrow are the domains where the agent can efficiently realize his preferences.
This definition avoids several problems common in many others definitions, especially it avoids anthropomorphizing intelligence.
The intelligence explosion idea was expressed by statistician I.J. Good in 1965:
Let an ultraintelligent machine be defined as a machine that can far surpass all the intellectual activities of any man however clever. Since the design of machines is one of these intellectual activities, an ultraintelligent machine could design even better machines; there would then unquestionably be an ‘intelligence explosion’, and the intelligence of man would be left far behind. Thus the first ultraintelligent machine is the last invention that man need ever make.
The argument is this: Every year, computers surpass human abilities in new ways. A program written in 1956 was able to prove mathematical theorems, and found a more elegant proof for one of them than Russell and Whitehead had given in Principia Mathematica. By the late 1990s, ‘expert systems’ had surpassed human skill for a wide range of tasks. In 1997, IBM’s Deep Blue computer beat the world chess champion, and in 2011, IBM’s Watson computer beat the best human players at a much more complicated game: Jeopardy!. Recently, a robot named Adam was programmed with our scientific knowledge about yeast, then posed its own hypotheses, tested them, and assessed the results.
Computers remain far short of human intelligence, but the resources that aid AI design are accumulating (including hardware, large datasets, neuroscience knowledge, and AI theory). We may one day design a machine that surpasses human skill at designing artificial intelligences. After that, this machine could improve its own intelligence faster and better than humans can, which would make it even more skilled at improving its own intelligence. This could continue in a positive feedback loop such that the machine quickly becomes vastly more intelligent than the smartest human being on Earth: an ‘intelligence explosion’ resulting in a machine superintelligence.
This is what is meant by the ‘intelligence explosion’ in this FAQ.
OK, it’s great that you want to help, here are some ideas for ways you could do so without making a huge commitment:
- Learning more about AI alignment will provide you with good foundations for any path towards helping. You could start by absorbing content (e.g. books, videos, posts), and thinking about challenges or possible solutions.
- Getting involved with the movement by joining a local Effective Altruism or LessWrong group, Rob Miles’s Discord, and/or the AI Safety Slack is a great way to find friends who are interested and will help you stay motivated.
- Donating to organizations or individuals working on AI alignment, possibly via a donor lottery or the Long Term Future Fund, can be a great way to provide support.
- Writing or improving answers on my wiki so that other people can learn about AI alignment more easily is a great way to dip your toe into contributing. You can always ask on the Discord for feedback on things you write.
- Getting good at giving an AI alignment elevator pitch, and sharing it with people who may be valuable to have working on the problem can make a big difference. However you should avoid putting them off the topic by presenting it in a way which causes them to dismiss it as sci-fi (dos and don’ts in the elevator pitch follow-up question).
- Writing thoughtful comments on AI posts on LessWrong.
- Participating in the AGI Safety Fundamentals program – either the AI alignment or governance track – and then facilitating discussions for it in the following round. The program involves nine weeks of content, with about two hours of readings + exercises per week and 1.5 hours of discussion, followed by four weeks to work on an independent project. As a facilitator, you'll be helping others learn about AI safety in-depth, many of whom are considering a career in AI safety. In the early 2022 round, facilitators were offered a stipend, and this seems likely to be the case for future rounds as well! You can learn more about facilitating in this post from December 2021.
Present-day AI algorithms already demand special safety guarantees when they must act in important domains without human oversight, particularly when they or their environment can change over time:
Achieving these gains [from autonomous systems] will depend on development of entirely new methods for enabling “trust in autonomy” through verification and validation (V&V) of the near-infinite state systems that result from high levels of [adaptability] and autonomy. In effect, the number of possible input states that such systems can be presented with is so large that not only is it impossible to test all of them directly, it is not even feasible to test more than an insignificantly small fraction of them. Development of such systems is thus inherently unverifiable by today’s methods, and as a result their operation in all but comparatively trivial applications is uncertifiable.
It is possible to develop systems having high levels of autonomy, but it is the lack of suitable V&V methods that prevents all but relatively low levels of autonomy from being certified for use.
- Office of the US Air Force Chief Scientist (2010). Technology Horizons: A Vision for Air Force Science and Technology 2010-30.
As AI capabilities improve, it will become easier to give AI systems greater autonomy, flexibility, and control; and there will be increasingly large incentives to make use of these new possibilities. The potential for AI systems to become more general, in particular, will make it difficult to establish safety guarantees: reliable regularities during testing may not always hold post-testing.
The largest and most lasting changes in human welfare have come from scientific and technological innovation — which in turn comes from our intelligence. In the long run, then, much of AI’s significance comes from its potential to automate and enhance progress in science and technology. The creation of smarter-than-human AI brings with it the basic risks and benefits of intellectual progress itself, at digital speeds.
As AI agents become more capable, it becomes more important (and more difficult) to analyze and verify their decisions and goals. Stuart Russell writes:
The primary concern is not spooky emergent consciousness but simply the ability to make high-quality decisions. Here, quality refers to the expected outcome utility of actions taken, where the utility function is, presumably, specified by the human designer. Now we have a problem:
- The utility function may not be perfectly aligned with the values of the human race, which are (at best) very difficult to pin down.
- Any sufficiently capable intelligent system will prefer to ensure its own continued existence and to acquire physical and computational resources – not for their own sake, but to succeed in its assigned task.
A system that is optimizing a function of n variables, where the objective depends on a subset of size k<n, will often set the remaining unconstrained variables to extreme values; if one of those unconstrained variables is actually something we care about, the solution found may be highly undesirable. This is essentially the old story of the genie in the lamp, or the sorcerer’s apprentice, or King Midas: you get exactly what you ask for, not what you want.
Bostrom’s “The Superintelligent Will” lays out these two concerns in more detail: that we may not correctly specify our actual goals in programming smarter-than-human AI systems, and that most agents optimizing for a misspecified goal will have incentives to treat humans adversarially, as potential threats or obstacles to achieving the agent’s goal.
If the goals of human and AI agents are not well-aligned, the more knowledgeable and technologically capable agent may use force to get what it wants, as has occurred in many conflicts between human communities. Having noticed this class of concerns in advance, we have an opportunity to reduce risk from this default scenario by directing research toward aligning artificial decision-makers’ interests with our own.
Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more.1 However, human intelligence continues to dominate machine intelligence in generality.
A powerful chess computer is “narrow”: it can’t play other games. In contrast, humans have problem-solving abilities that allow us to adapt to new contexts and excel in many domains other than what the ancestral environment prepared us for.
In the absence of a formal definition of “intelligence” (and therefore of “artificial intelligence”), we can heuristically cite humans’ perceptual, inferential, and deliberative faculties (as opposed to, e.g., our physical strength or agility) and say that intelligence is “those kinds of things.” On this conception, intelligence is a bundle of distinct faculties — albeit a very important bundle that includes our capacity for science.
Our cognitive abilities stem from high-level patterns in our brains, and these patterns can be instantiated in silicon as well as carbon. This tells us that general AI is possible, though it doesn’t tell us how difficult it is. If intelligence is sufficiently difficult to understand, then we may arrive at machine intelligence by scanning and emulating human brains or by some trial-and-error process (like evolution), rather than by hand-coding a software agent.
If machines can achieve human equivalence in cognitive tasks, then it is very likely that they can eventually outperform humans. There is little reason to expect that biological evolution, with its lack of foresight and planning, would have hit upon the optimal algorithms for general intelligence (any more than it hit upon the optimal flying machine in birds). Beyond qualitative improvements in cognition, Nick Bostrom notes more straightforward advantages we could realize in digital minds, e.g.:
- editability — “It is easier to experiment with parameter variations in software than in neural wetware.”2
- speed — “The speed of light is more than a million times greater than that of neural transmission, synaptic spikes dissipate more than a million times more heat than is thermodynamically necessary, and current transistor frequencies are more than a million times faster than neuron spiking frequencies.”
- serial depth — On short timescales, machines can carry out much longer sequential processes.
- storage capacity — Computers can plausibly have greater working and long-term memory.
- size — Computers can be much larger than a human brain.
- duplicability — Copying software onto new hardware can be much faster and higher-fidelity than biological reproduction.
Any one of these advantages could give an AI reasoner an edge over a human reasoner, or give a group of AI reasoners an edge over a human group. Their combination suggests that digital minds could surpass human minds more quickly and decisively than we might expect.
Humanity hasn't yet built a superintelligence, and we might not be able to without significantly more knowledge and computational resources There could be an existential catastrophe that prevents us from ever building one. For the rest of the answer let's assume no such event stops technological progress.
With that out of the way: there is no known good theoretical reason we can't build it at some point in the future; the majority of AI research is geared towards making more capable AI systems; and a significant chunk of top-level AI research attempts to make more generally capable AI systems. There is a clear economic incentive to develop more and more intelligent machines and currently billions of dollars of funding are being deployed for advancing AI capabilities.
We consider ourselves to be generally intelligent (i.e. capable of learning and adapting ourselves to a very wide range of tasks and environments), but the human brain almost certainly isn't the most efficient way to solve problems. One hint is the existence of AI systems with superhuman capabilities at narrow tasks. Not only superhuman performance (as in, AlphaGo beating the Go world champion) but superhuman speed and precision (as in, industrial sorting machines). There is no known discontinuity between tasks, something special and unique about human brains that unlocks certain capabilities which cannot be implemented in machines in principle. Therefore we would expect AI to surpass human performance on all tasks as progress continues.
In addition, several research groups (DeepMind being one of the most overt about this) explicitly aim for generally capable systems. AI as a field is growing, year after year. Critical voices about AI progress usually argue against a lack of precautions around the impact of AI, or against general AI happening very soon, not against it happening at all.
A satire of arguments against the possibility of superintelligence can be found here.
The basic concern as AI systems become increasingly powerful is that they won’t do what we want them to do – perhaps because they aren’t correctly designed, perhaps because they are deliberately subverted, or perhaps because they do what we tell them to do rather than what we really want them to do (like in the classic stories of genies and wishes.) Many AI systems are programmed to have goals and to attain them as effectively as possible – for example, a trading algorithm has the goal of maximizing profit. Unless carefully designed to act in ways consistent with human values, a highly sophisticated AI trading system might exploit means that even the most ruthless financier would disavow. These are systems that literally have a mind of their own, and maintaining alignment between human interests and their choices and actions will be crucial.
A Superintelligence would be intelligent enough to understand what the programmer’s motives were when designing its goals, but it would have no intrinsic reason to care about what its programmers had in mind. The only thing it will be beholden to is the actual goal it is programmed with, no matter how insane its fulfillment may seem to us.
Consider what “intentions” the process of evolution may have had for you when designing your goals. When you consider that you were made with the “intention” of replicating your genes, do you somehow feel beholden to the “intention” behind your evolutionary design? Most likely you don't care. You may choose to never have children, and you will most likely attempt to keep yourself alive long past your biological ability to reproduce.
The Control Problem is the problem of preventing artificial superintelligence (ASI) from having a negative impact on humanity. How do we keep a more intelligent being under control, or how do we align it with our values? If we succeed in solving this problem, intelligence vastly superior to ours can take the baton of human progress and carry it to unfathomable heights. Solving our most complex problems could be simple to a sufficiently intelligent machine. If we fail in solving the Control Problem and create a powerful ASI not aligned with our values, it could spell the end of the human race. For these reasons, The Control Problem may be the most important challenge that humanity has ever faced, and may be our last.