Browse FAQ

From Stampy's Wiki

Answers which have been rated highly are displayed on this page. For answers in need of attention, see Review answers. For a view of the questions and answers imported from external FAQs see Imported FAQs.

Questions and Answers

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. But one thing that makes humans special is their general intelligence. Humans can intelligently adapt to radically new problems in the urban jungle or outer space for which evolution could not have prepared them. Humans can solve problems for which their brain hardware and software was never trained. Humans can even examine the processes that produce their own intelligence (cognitive neuroscience), and design new kinds of intelligence never seen before (artificial intelligence).

To possess greater-than-human intelligence, a machine must be able to achieve goals more effectively than humans can, in a wider range of environments than humans can. This kind of intelligence involves the capacity not just to do science and play chess, but also to manipulate the social environment.

Computer scientist Marcus Hutter has described a formal model called AIXI that he says possesses the greatest general intelligence possible. But to implement it would require more computing power than all the matter in the universe can provide. Several projects try to approximate AIXI while still being computable, for example MC-AIXI.

Still, there remains much work to be done before greater-than-human intelligence can be achieved in machines. Greater-than-human intelligence need not be achieved by directly programming a machine to be intelligent. It could also be achieved by whole brain emulation, by biological cognitive enhancement, or by brain-computer interfaces (see below).

See also:

I'm interested in working on AI Safety, what should I do?


AI Safety Support offers free calls to advise people interested in a career in AI Safety. We're working on creating a bunch of detailed information for Stampy to use, but in the meantime check out these resources:

80,000 Hours
AISS links page
AI Safety events calendar
Adam Gleave's Careers in Beneficial AI Research document
Rohin Shah's FAQ

Stamps: ^

Tags: careers (edit tags)

Using some human-related metaphors (e.g. what an AGI ‘wants’ or ‘believes’) is almost unavoidable, as our language is built around experiences with humans, but we should be aware that these may lead us astray.

Many paths to AGI would result in a mind very different from a human or animal, and it would be hard to predict in detail how it would act. We should not trust intuitions trained on humans to predict what an AGI or superintelligence would do. High fidelity Whole Brain Emulations are one exception, where we would expect the system to at least initially be fairly human, but it may diverge depending on its environment and what modifications are applied to it.

There has been some discussion about how language models trained on lots of human-written text seem likely to pick up human concepts and think in a somewhat human way, and how we could use this to improve alignment.

Stamps: Aprillion


What is MIRI’s mission?

What is MIRI’s mission? What is MIRI trying to do? What is MIRI working on?


MIRI's mission statement is to “ensure that the creation of smarter-than-human artificial intelligence has a positive impact.” This is an ambitious goal, but they believe that some early progress is possible, and they believe that the goal’s importance and difficulty makes it prudent to begin work at an early date.

Their two main research agendas, “Agent Foundations for Aligning Machine Intelligence with Human Interests” and “Value Alignment for Advanced Machine Learning Systems,” focus on three groups of technical problems:

  • highly reliable agent design — learning how to specify highly autonomous systems that reliably pursue some fixed goal;
  • value specification — supplying autonomous systems with the intended goals; and
  • error tolerance — making such systems robust to programmer error.

That being said, MIRI recently published an update stating that they were moving away from research directions in unpublished works that they were pursuing since 2017.

They publish new mathematical results (although their work is non-disclosed by default), host workshops, attend conferences, and fund outside researchers who are interested in investigating these problems. They also host a blog and an online research forum.

Stamps: plex

Tags: miri (edit tags)

abramdemski and Scott Garrabrant's post on decision theory provides a good overview of many aspects of the topic, while Functional Decision Theory: A New Theory of Instrumental Rationality seems to be the most up to date source on current thinking.

For a more intuitive dive into one of the core problems, Newcomb's problem and regret of rationality is good, and Newcomblike problems are the norm is useful for seeing how it applies in the real world.

The LessWrong tag for decision theory has lots of additional links for people who want to explore further.

Stamps: plex


Stampy is focused specifically on AI existential safety (both introductory and technical questions), but does not aim to cover general AI questions or other topics which don't interact strongly with the effects of AI on humanity's long-term future.

stampy is focused on answering common questions people have which are specifically about AI existential safety. More technical questions are also in our scope, though replying to all possible proposals is not feasible and this is not a great place to submit detailed ideas for evaluation.

We are interested in:

  • Questions which come up often when people are introduced to this collection of ideas and are strongly relevant to the field e.g.
    • "How long will it be until transformative AI arrives?"
    • "Why might advanced AI harm humans?"
  • Technical questions related to the field e.g.
    • "What is Cooperative Inverse Reinforcement Learning?"
    • "What is Logical Induction useful for?"
  • Questions about how to contribute to the field e.g.
    • "Should I get a PhD?"
    • "Where can I find relevant job opportunities?"

More good examples can be found in Category:Canonical_questions.

We do not aim to cover:

  • Aspects of AI Safety or fairness which are not strongly relevant to existential safety e.g.
    • "How should self-driving cars weigh up moral dilemmas"
    • "How can we minimize the risk of privacy problems caused by machine learning algorithms?"
  • Extremely specific and detailed questions the answering of which is unlikely to be of value to more than a single person e.g.
    • "What if we did <multiple paragraphs of dense text>? Would that result in safe AI?"

We will generally not delete out-of-scope content, but it will be reviewed as low priority to answer (either "Meh" or "Rejected"), not be marked as a canonical question, and not be served to readers by User:Stampy.

Stamps: plex

Tags: stampy (edit tags)

We could, but we won’t. Each advance in capabilities which brings us closer to an intelligence explosion also brings vast profits for whoever develops them (e.g. smarter digital personal assistants like Siri, more ability to automate cognitive tasks, better recommendation algorithms for Facebook, etc.). The incentives are all wrong. Any actor (nation or corporation) who stops will just get overtaken by more reckless ones, and everyone knows this.

Stamps: plex


Nick Bostrom defines superintelligence as “an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills.” A chess program can outperform humans in chess, but is useless at any other task. Superintelligence will have been achieved when we create a machine that outperforms the human brain across practically any domain.

See more...

Ryan Paton's question on Intro to AI Safety

Could there be any caveats to training an AI to "not hurt or kill or cause harm to any living creature"? I suppose you would need to provide a definition for "living creature" and to "harm or kill" that the AI would understand....


You would start to run into the whack-a-mole problem. Basically, whenever you make a hard "don't ever do X" rule, you will absolutely wind up having to make dozens of exceptions each time the AI works around said rule.
Ex: Make a medical research AI and program it to Not harm Living Creatures
AI halts, since any action it takes will cause harm to at least one single-celled organism
You make an exception for anything under a few hundred cells
AI creates a new medication that has a side effect of killing gut flora/fauna—anyone who takes it dies of malnutrition
You make an exception to the exception for things living inside humans
AI halts trying to make a de-worming drug because it cannot harm things living in humans
Etc


Tags: None (add tags)

Andy Gee's question on Mesa-Optimizers 2

@3:54 you mention providing the whole of Wikipedia for learning data. Wikipedia details several methods for breaking memory containment. If this is provided to an advanced AI couldn't that AI become aware that it may me constrained within blocks of memory, and thus attempt to bypass those constraints to maximize it's reward function?
These vulnerabilities have been present in all Intel and AMD CPUs for 20+ years before discovery and have been largely mitigated, however the "concept" of looking for vulnerabilities in micro architecture is something an AI can do a lot better than humans can. If you read the Assembly for pre-forking in Intel chips, it's pretty obvious the entire memory space is available while the CPU is predicting what will be required of it next.
Presuming containment of an AI system is important, isn't feeding massive datasets a considerable risk, not only for intellectual property rights but to maintain control of the AI?

Here's some examples of existing vulnerabilities, who knows how many more there are.
https://meltdownattack.com/
https://en.wikipedia.org/wiki/Meltdown_(security_vulnerability)
https://en.wikipedia.org/wiki/Microarchitectural_Data_Sampling


Trying to hide information from an AGI is almost certainly not an avenue towards safety - if the agent is better at reasoning than us, it is likely to derive information relevant to safety considerations that we wouldn't think to hide. It is entirely appropriate, then, to use thought experiments like these where the AGI has such a large depth of information, because our goal should be to design systems that behave safely even in such permissive environments.

Stamps: Damaged, SlimeBunnyBat, Aprillion, plex

Tags: None (add tags)

How could you distinguish a very stupid person from a smart person who's terminal goal is to look stupid? Occam's razor suggests that the first assumption is more rational, but can you know for sure?


Acting stupid could be entirely part of an actor's psychological fight with another actor—for example, feigning a tell in a game of poker. The answer to your first question is, therefore, you cannot without gathering more information—but you should definitely plan for both eventualities, particularly if you're an actor they're interacting with. Occam's Razor only gives a hint toward what is more likely given a limited set of information.


Tags: None (add tags)

Richard Collins's question on Quantilizers

Yay he's back. :)

So anyway, my question. Would not, over time, the maximiser adjust it's output to compensate for the Quantizer?

 The problem I see with AGI that when we have one bad human there is only ever that one bad human. We make one bad AGI, it's first thing to do will be to replicated itself making thousands of bad AGI's.

I'm a software developer, the moment an AGI can do software development I am out of a job. Because corporations can just do 'copy n paste' to get all the developers they need.


if the maximizer was first deciding what it wants to accomplish, and then accomplishing this goal by outputting a ranked list of actions which is then sampled by the maximizer, then a smart maximizer would indeed learn to fool the quantilizing step pretty quickly. however, the maximizer is a machine, it doesn't want to fool the quantilizer, it doesn't want to have an effect on the world, it just wants to output a correctly-ranked list of actions.


Tags: None (add tags)

Illesizs's question on Quantilizers

Why don't we cut off the most unlikely solutions from the other end?
We could create a 1%-10% Quantilizer for an even more human-like super intelligence.


We had a discussion about this idea in response to a different comment, though we didn't really come to any firm conclusions. You can read it here if you like: https://pastebin.com/FVUNCBJt


Tags: None (add tags)

Is AGI avoidable? Is there a way to advance in technology and evolve as a humanity in general without ever coming to point where we turn that thing on. More philosophical one.


While physically possible to avoid the creation of AGI, existing (and foreseeable) economic/political incentives make it very very unlikely we will change the direction of the advance in technology away from AGI.

Stamps: plex, Aprillion, ^

Tags: None (add tags)

Traywor's question on Mesa-Optimizers 2

So if deception is kind of a default behaviour of intelligent agents, why is it so much different with humans? Clearly their must be a mechanism inside a human being, who is not a psychopath, which ensures that they won't deceive, lets say, friends they really care about.


Yes, the mechanism is called reciprocity - https://en.wikipedia.org/wiki/Reciprocity_(evolution)


Tags: None (add tags)

What if you create a cost function and make a utility function a normal distribution, so the AI finds the cheapest way to find around 100 stamps?


The outcome depends on what exactly your cost function is. The AI will trade an arbitrarily large amount of the things not specified in the cost function for an arbitrarily small increase in expected utility (this channel, "Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1
", 03:02) - it cannot know with absolute certainty that it indeed has 100 stamps, so anything it can do to increase its certainty thereof will be a potential action it can take, provided it does not involve things specified in the cost function. That assumes it's even possible to well-specify the things you care about in your cost function, and this difficulty is explored on Computerphile, in "Why Asimov's Laws of Robotics Don't Work".

"Just define a cost function" is, on the low end, ineffective, and on the high end, has nothing remotely "just" about it.


Tags: None (add tags)

I'm way late catching up with this video, so no one will ever see this comment. But, for the record, I think the point, late in the video, should not be that maybe thinking about AI safety will make things worse when we actually build AI. I think it's that we might talk ourselves OUT of making AI, and we'll never know what it might have done for us. Take the bridge analogy. Suppose the safety analysis causes the bridge to be cancelled (unlikely for a bridge, but possible -- if there's a problem with the rocks where the piers have to go, say -- we simply can't put a 100% safe bridge in this location). So, no bridge, so people keep crossing using a ferry, and a few years later, about the time the bridge would have opened, the ferry sinks and everyone on it dies. Yes, the bridge was a risk, but so was no bridge. AI is a risk -- the question is, is no AI a risk? Do we need AI to solve problems we won't be able to solve without it, but which will affect our survival as a species? Or is AI a luxury -- nice to have, but not worth any meaningful risk? Personally, I tend toward the latter view, so if Mr. Miles and those like him talk us out of ever trying AI, then that's a shame -- a real Mr. Data would be fun -- but no great loss. But what if there's a way to cheaply fix carbon that an AI could find but we never will? Or to cure cancer, or to enable light-speed travel so we can get off this doomed rock? We'll never know. We'll die on that ferry never knowing the bridge -- risky though it is -- might have saved us.


I would imagine that the solution to the analogy you give would have been more safety research on the ferry, not less on the bridge.

You raise a good point though, that AI safety shouldnt just raise problems, it should also seek solutions to those problems (which the field is doing a decent job at given the difficulty of the task). The ultimate goal of the field of AI safety is to create an aligned AGI. If the outcome of all of the research is that aligned AGI is impossible, that will be a rather unfortunate turn of events, but still better than not having done any safety research, since we can then decide with much more information if we need to take the risk in order to prevent a different catastrophe or if we can find a different solution that doesnt have a 75% chance of destroying us anyway

A very good research paper that explores this is "Artificial Intelligence as a Positive and Negative Factor in Global Risk" if you are interested in this topic specifically


Tags: None (add tags)

Harsh Deshpande's question on Mesa-Optimizers

Why would reducing the number of people to zero be a bad thing necessarily? I thought the goal was to sbow undesirable outcomes. How is zero suffering undesirable?


If you're a negative utilitarian that point is valid, but consider not being one. Also, killing everyone is just one example of a thing most people agree is bad, futures with large amounts of suffering or other things negative utilitarians would dislike are also possible with a misaligned AI.


Tags: None (add tags)

See more...