Top answers

From Stampy's Wiki

These are the highest rated Answers on the wiki, with canonical answers first.

Top answers

StampsWho has given this a stamp of approval?
Mark as
Can you give an AI a goal of “minimally impact the world”?
Answer to Can you give an AI a goal of “minimally impact the world”? (edit)

This question is discussed in a video!

plex, Aprillion

No tags (edit tags)
What is the Control Problem?
Answer to What is the Control Problem? (edit)

The Control Problem is the problem of preventing artificial superintelligence (ASI) from having a negative impact on humanity. How do we keep a more intelligent being under control, or how do we align it with our values? If we succeed in solving this problem, intelligence vastly superior to ours can take the baton of human progress and carry it to unfathomable heights. Solving our most complex problems could be simple to a sufficiently intelligent machine. If we fail in solving the Control Problem and create a powerful ASI not aligned with our values, it could spell the end of the human race. For these reasons, The Control Problem may be the most important challenge that humanity has ever faced, and may be our last.


No tags (edit tags)
Is it possible to block an AI from doing certain things on the internet/accessing things on the internet - like some child lock type thing?
Answer to Is it possible to block an AI from doing certain things on the internet? (edit)

Once an AGI has access to the internet it would be very challenging to meaningfully restrict it from doing things online which it wants to. There are too many options to bypass blocks we may put in place.

It may be possible to design it so that it does not want to do dangerous things in the first place, or perhaps to set up tripwires so that we notice that it’s trying to do a dangerous thing, though that relies on it not noticing or bypassing the tripwire so should not be the only layer of security.

Related questions:
Is it possible to limit an AGI from full access to the internet?


No tags (edit tags)
Is there a danger in anthropomorphising AI’s and trying to understand them in human terms?
Answer to Is there a danger in anthropomorphising AI’s and trying to understand them in human terms? (edit)

Using some human-related metaphors (e.g. what an AGI ‘wants’ or ‘believes’) is almost unavoidable, as our language is built around experiences with humans, but we should be aware that these may lead us astray.

Many paths to AGI would result in a mind very different from a human or animal, and it would be hard to predict in detail how it would act. We should not trust intuitions trained on humans to predict what an AGI or superintelligence would do. High fidelity Whole Brain Emulations are one exception, where we would expect the system to at least initially be fairly human, but it may diverge depending on its environment and what modifications are applied to it.

There has been some discussion about how language models trained on lots of human-written text seem likely to pick up human concepts and think in a somewhat human way, and how we could use this to improve alignment.


No tags (edit tags)
Why can't we turn the computers off?
Answer to Why can't we turn the computers off? (edit)

We could shut it down weaker systems, and this would be a useful guardrail against certain types of problem caused by narrow AI. However, once an AGI establishes itself (i.e. copies of itself everywhere, and later technological superiority), we could not unless it was corrigiable ( and willing to let humans adjust it. There may be a period in the early stages of an AGI's development where it would be trying very hard to convince us that we should not shut it down and/or hiding itself and/or making copies of itself onto every server on earth.

Instrumental Convergence ( and the Stop Button Problem ( are the key reasons it would not be simple to shut down a non corrigible advanced system. If the AI wants to collect stamps, being turned off means it gets less stamps, so even without an explicit goal of not being turned off it has an instrumental reason to avoid being turned off (e.g. once it acquires a detailed world model and general intelligence, it is likely to realize that by playing nice and pretending to be aligned if you have the power to turn it off, establishing control over any system we put in place to shut it down, and eliminating us if it has the power to reliably do so and we would otherwise pose a threat).


No tags (edit tags)
Can humans and a superintelligence co-exist without the superintelligence destroying the humans?
Answer to Can humans and a superintelligence co-exist without the superintelligence destroying the humans? (edit)

Yes, if the superintelligence has goals which include humanity surviving then we would not be destroyed. If those goals are fully aligned with human well-being, we would in fact find ourselves in a dramatically better place.


No tags (edit tags)
What is the general nature of the concern about AI safety?
Answer to What is the general nature of the concern about AI safety? (edit)

The basic concern as AI systems become increasingly powerful is that they won’t do what we want them to do – perhaps because they aren’t correctly designed, perhaps because they are deliberately subverted, or perhaps because they do what we tell them to do rather than what we really want them to do (like in the classic stories of genies and wishes.) Many AI systems are programmed to have goals and to attain them as effectively as possible – for example, a trading algorithm has the goal of maximizing profit. Unless carefully designed to act in ways consistent with human values, a highly sophisticated AI trading system might exploit means that even the most ruthless financier would disavow. These are systems that literally have a mind of their own, and maintaining alignment between human interests and their choices and actions will be crucial.


No tags (edit tags)
Can you stop an advanced AI from upgrading itself?
Answer to Can you stop an advanced AI from upgrading itself? (edit)

It depends on what is meant by advanced. Many AI systems which are very effective and advanced narrow intelligences would not try to upgrade themselves in an unbounded way, but becoming smarter is a convergent instrumental goal so we could expect most AGI designs to attempt it.

The problem is that increasing general problem solving ability is climbing in exactly the direction needed to trigger an intelligence explosion, while generating large economic and strategic payoffs to whoever achieves them. So even though we could, in principle, just not build the kind of systems which would recursively self-improve, in practice we probably will go ahead with constructing them, because they’re likely to be the most powerful.


No tags (edit tags)
How soon will transformative AI / AGI / superintelligence likely come and why?
Answer to How soon will transformative AI likely come and why? (edit)

Very hard to say. This draft report ( for the Open Philanthropy Project is perhaps the most careful attempt so far (and generates these graphs:, but there have also been expert surveys (, and many people have shared various thoughts ( Berkeley AI professor Stuart Russell ( has given his best guess as “sometime in our children’s lifetimes”, and Ray Kurzweil ( (Google’s director of engineering) predicts human level AI by 2029 and the singularity by 2045 ( The Metaculus question on publicly known AGI ( has a median of around 2029 (around 10 years sooner than it was before the GPT-3 AI showed unexpected ability on a broad range of tasks:

The consensus answer is something like: “highly uncertain, maybe not for over a hundred years, maybe in less than 15, with around the middle of the century looking fairly plausible”.


No tags (edit tags)
arnt you just cutting off the top 10% best performing results?

just because the top results are usually catastrophic? there could be valid results in that top 10%, and there could be dangerous results in the part you picking from

Answer to Hindu Goat's question on Quantilizers (edit)

This is a really interesting question! Because, yeah it certainly seems to me that doing something like this would at least help, but it's not mentioned in the paper the video is based on. So I asked the author of the paper, and she said "It wouldn't improve the security guarantee in the paper, so it wasn't discussed. Like, there's a plausible case that it's helpful, but nothing like a proof that it is".
To explain this I need to talk about something I gloss over in the video, which is that the quantilizer isn't really something you can actually build. The systems we study in AI Safety tend to fall somewhere on a spectrum from "real, practical AI system that is so messy and complex that it's hard to really think about or draw any solid conclusions from" on one end, to "mathematical formalism that we can prove beautiful theorems about but not actually build" on the other, and quantilizers are pretty far towards the 'mathematical' end. It's not practical to run an expected utility calculation on every possible action like that, for one thing. But, proving things about quantilizers gives us insight into how more practical AI systems may behave, or we may be able to build approximations of quantilizers, etc.
So it's like, if we built something that was quantilizer-like, using a sensible human utility function and a good choice of safe distribution, this idea would probably help make it safer. BUT you can't prove that mathematically, without making probably a lot of extra assumptions about the utility function and/or the action distribution. So it's a potentially good idea that's nonetheless hard to express within the framework in which the quantilizer exists.
TL;DR: This is likely a good idea! But can we prove it?


No tags (edit tags)
Is it possible to code into an AI to avoid all the ways a given task could go wrong - and is it dangerous to try that?
Answer to Is it possible to code into an AI to avoid all the ways a given task could go wrong - and is it dangerous to try that? (edit)

Sort answer: No, and could be dangerous to try.

Slightly longer answer: With any realistic real-world task assigned to an AGI, there are so many ways in which it could go wrong that trying to block them all off by hand is a hopeless task, especially when something smarter than you is trying to find creative new things to do. You run into the nearest unblocked strategy problem.

It may be dangerous to try this because if you try and hard-code a large number of things to avoid it increases the chance that there’s a bug in your code which causes major problems, simply by increasing the size of your codebase.


No tags (edit tags)
What is AGI and what will it look like?
Answer to What exactly is AGI and what will it look like? (edit)

AGI is an algorithm with general intelligence, running not on evolution’s biology like all current general intelligences but on a substrate such as silicon engineered by an intelligence (initially computers designed by humans, later on likely dramatically more advanced hardware designed by earlier AGIs).

AI has so far always been designed and built by humans (i.e. a search process running on biological brains), but once our creations gain the ability to do AI research they will likely recursively self-improve by designing new and better versions of themselves initiating an intelligence explosion (i.e. use it’s intelligence to improve its own intelligence, creating a feedback loop), and resulting in a superintelligence. There are already early signs of AIs being trained to optimize other AIs.

Some authors (notably Robin Hanson) have argued that the intelligence explosion hypothesis is likely false, and in favor of a large number of roughly human level emulated minds operating instead, forming an uplifted economy which doubles every few hours. Eric Drexler’s Comprehensive AI Services model of what may happen is another alternate view, where many narrow superintelligent systems exist in parallel rather than there being a general-purpose superintelligent agent.

Going by the model advocated by Nick Bostrom, Eliezer Yudkowsky and many others, a superintelligence will likely gain various cognitive superpowers (table 8 gives a good overview), allowing it to direct the future much more effectively than humanity. Taking control of our resources by manipulation and hacking is a likely early step, followed by developing and deploying advanced technologies like molecular nanotechnology to dominate the physical world and achieve its goals.


No tags (edit tags)
Why can’t we just use Asimov’s 3 laws of robotics?
Answer to Why can’t we just use Asimov’s 3 laws of robotics? (edit)

Isaac Asimov wrote those laws as a plot device for science fiction novels. Every story in the I, Robot series details a way that the laws can go wrong and be misinterpreted by robots. The laws are not a solution because they are an overly-simple set of natural language instructions that don’t have clearly defined terms and don’t factor in all edge-case scenarios.


No tags (edit tags)
Why is transformative AI / AGI / superintelligence dangerous? Why might AI harm humans?
Answer to Why is AGI dangerous? (edit)

1. The Orthogonality Thesis: AI could have almost any goal while at the same time having high intelligence (aka ability to succeed at those goals). This means that we could build a very powerful agent which would not necessarily share human-friendly values. For example, the classic paperclip maximizer thought experiment explores this with an AI which has a goal of creating as many paperclips as possible, something that humans are (mostly) indifferent to, and as a side effect ends up destroying humanity to make room for more paperclip factories.
2. Complexity of value: What humans care about is not simple, and the space of all goals is large, so virtually all goals we could program into an AI would lead to worlds not valuable to humans if pursued by a sufficiently powerful agent. If we, for example, did not include our value of diversity of experience, we could end up with a world of endlessly looping simple pleasures, rather than beings living rich lives.
3. Instrumental Convergence: For almost any goal an AI has there are shared ‘instrumental’ steps, such as acquiring resources, preserving itself, and preserving the contents of its goals. This means that a powerful AI with goals that were not explicitly human-friendly would predictably both take actions that lead to the end of humanity (e.g. using resources humans need to live to further its goals, such as replacing our crop fields with vast numbers of solar panels to power its growth, or using the carbon in our bodies to build things) and prevent us from turning it off or altering its goals.


No tags (edit tags)
Can an AI really be smarter than humans? Hasn't this been said for the past 30 years? Why is the near future different?
Answer to Can an AI really be smarter than humans? (edit)

Until a thing has happened, it has never happened. We have been consistently improving both the optimization power and generality of our algorithms over that time period, and have little reason to expect it to suddenly stop. We’ve gone from coding systems specifically for a certain game (like Chess), to algorithms like MuZero which learn the rules of the game they’re playing and how to play at vastly superhuman skill levels purely via self-play across a broad range of games (e.g. Go, chess, shogi and various Atari games).

Human brains are a spaghetti tower generated by evolution with zero foresight, it would be surprising if they are the peak of physically possible intelligence. The brain doing things in complex ways is not strong evidence that we need to fully replicate those interactions if we can throw sufficient compute at the problem, as explained in Birds, Brains, Planes, and AI: Against Appeals to the Complexity/Mysteriousness/Efficiency of the Brain.

It is, however, plausible that for an AGI we need a lot more compute than we will get in the near future, or that some key insights are missing which we won’t get for a while. The OpenPhilanthropy report on how much computational power it would take to simulate the brain is the most careful attempt at reasoning out how far we are from being able to do it, and suggests that by some estimates we already have enough computational resources, and by some estimates moore’s law may let us reach it before too long.

It also seems that much of the human brain exists to observe and regulate our biological body, which a body-less computer wouldn't need. If that's true, then a human-level AI might be possible with way less computation power than the human brain.


No tags (edit tags)
What are some good introductory books about AI Safety?
Answer to What are some good introductory books about AI Safety? (edit)

The defining book is likely Nick Bostrom's Superintelligence. It gives an excellent overview of the state of the field in 2014 and makes a strong case for the subject being important.

There's also Human Compatible by Stuart Russell, which gives a more up-to-date review of developments, with an emphasis on the approaches that the Center for Human Compatible AI are working on. There's a good review/summary on SlateStarCodex.

The Alignment Problem by Brian Christian has more of an emphasis on near future problems with AI than Superintelligence or Human Compatible, but covers a good deal of current research.

Though not limited to AI Safety, Rationality: A-Z covers a lot of skills which are valuable to acquire for people trying to think about large and complex issues.

Various other books are explore the issues in an informed way, such as The Precipice, Life 3.0, and Homo Deus.


No tags (edit tags)
Wouldn’t it be intelligent enough to know right from wrong?
Answer to Wouldn’t it be intelligent enough to know right from wrong? (edit)

As far as we know from the observable universe, morality is just a construct of the human mind. It is meaningful to us, but it is not necessarily meaningful to the vast universe outside of our minds. There is no reason to suspect that our set of values is objectively superior to any other arbitrary set of values, e.i. “the more paper clips, the better!” Consider the case of the psychopathic genius. Plenty have existed, and they negate any correlation between intelligence and morality.


No tags (edit tags)
How quickly could an AI go from “ooh wait why has it done that” to “OH FUCK WHAT HAVE WE DONE AHHHH -” (world ends)
Answer to How quickly could an AI go from “ooh wait why has it done that” to “OH FUCK WHAT HAVE WE DONE AHHHH -” (world ends) (edit)

If the AI system was deceptively aligned (i.e. pretending to be nice until it was in control of the situation) or had been in stealth mode while getting things in place for a takeover, quite possibly within hours. We may get more warning with weaker systems, if the AGI does not feel at all threatened by us, or if a complex ecosystem of AI systems is built over time and we gradually lose control.

Paul Christiano writes an story of alignment failure which shows a relatively fast transition.


No tags (edit tags)
On a scale of 1 to 100 how doomed is humanity?
Answer to On a scale of 1 to 100 how doomed is humanity? (edit)

The opinions from experts are all over the place, according to this 2021 survey. I’ve heard everything from essentially certain doom, to less than 5% chance of things going horribly wrong, all from people deep in the field.


No tags (edit tags)
Isn’t AI just a tool like any other? Won’t AI just do what we tell it to do?
Answer to Isn’t AI just a tool like any other? Won’t AI just do what we tell it to do? (edit)

It likely will – however, intelligence is, by many definitions, the ability to figure out how to accomplish goals. Even in today’s advanced AI systems, the builders assign the goal but don’t tell the AI exactly how to accomplish it, nor necessarily predict in detail how it will be done; indeed those systems often solve problems in creative, unpredictable ways. Thus the thing that makes such systems intelligent is precisely what can make them difficult to predict and control. They may therefore attain the goal we set them via means inconsistent with our preferences.


No tags (edit tags)

See more...