Main Page

From Stampy's Wiki
Welcome to Stampy's Wiki, a volunteer effort to build an interactive FAQ about Artificial Intelligence existential safety—which is the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do the kinds of things we would like them to do.

We're still in alpha, so you can't yet talk to Stampy to interact with the FAQ, but if you want to help out we could always use more question answerers and developers! If you're interested in guiding or contributing to the project you're welcome to drop by the Discord and talk with us about it (let plex#1874 or the wiki team know if you want to help and need an invite).

Editathons happen on the voice channel in Rob's Discord at every week, if you'd like to be social while writing questions and answers :) Next week's is 8 pm Tuesday UK time, register your availability for the next one here.

Alternate sortings

Machines are already smarter than humans are at many specific tasks: performing calculations, playing chess, searching large databanks, detecting underwater mines, and more. But one thing that makes humans special is their general intelligence. Humans can intelligently adapt to radically new problems in the urban jungle or outer space for which evolution could not have prepared them. Humans can solve problems for which their brain hardware and software was never trained. Humans can even examine the processes that produce their own intelligence (cognitive neuroscience), and design new kinds of intelligence never seen before (artificial intelligence).

To possess greater-than-human intelligence, a machine must be able to achieve goals more effectively than humans can, in a wider range of environments than humans can. This kind of intelligence involves the capacity not just to do science and play chess, but also to manipulate the social environment.

Computer scientist Marcus Hutter has described a formal model called AIXI that he says possesses the greatest general intelligence possible. But to implement it would require more computing power than all the matter in the universe can provide. Several projects try to approximate AIXI while still being computable, for example MC-AIXI.

Still, there remains much work to be done before greater-than-human intelligence can be achieved in machines. Greater-than-human intelligence need not be achieved by directly programming a machine to be intelligent. It could also be achieved by whole brain emulation, by biological cognitive enhancement, or by brain-computer interfaces (see below).

See also:

I'm interested in working on AI Safety, what should I do?

AI Safety Support offers free calls to advise people interested in a career in AI Safety. We're working on creating a bunch of detailed information for Stampy to use, but in the meantime check out these resources:

80,000 Hours
AISS links page
AI Safety events calendar
Adam Gleave's Careers in Beneficial AI Research document
Rohin Shah's FAQ

Stamps: ^

Tags: careers (edit tags)

Using some human-related metaphors (e.g. what an AGI ‘wants’ or ‘believes’) is almost unavoidable, as our language is built around experiences with humans, but we should be aware that these may lead us astray.

Many paths to AGI would result in a mind very different from a human or animal, and it would be hard to predict in detail how it would act. We should not trust intuitions trained on humans to predict what an AGI or superintelligence would do. High fidelity Whole Brain Emulations are one exception, where we would expect the system to at least initially be fairly human, but it may diverge depending on its environment and what modifications are applied to it.

There has been some discussion about how language models trained on lots of human-written text seem likely to pick up human concepts and think in a somewhat human way, and how we could use this to improve alignment.

Stamps: Aprillion

See more...

Alternate sortings

Yes, the problems of AI alignment and making corporations behave are somewhat related and solutions in either field could inspire each other. However, corporations make actions in the world on a slow timescale, comparable to the speed of other groups of humans who try to regulate the power of corporations. AIs could act much much faster, so the control problem might be unsolvable and our best hope might be to make friendly AI who would share our values, not trying to outsmart adversarial AIs. See also "Why Not Just - Think of AGI Like a Corporation" .

Stamps: Aprillion, Damaged

Tags: None (add tags)

This is actually an active area of AI alignment research, called "Impact Measures"! It's not trivial to formalize in a way which won't predictably go wrong (entropy minimization likely leads to an AI which tries really hard to put out all the stars ASAP since they produce so much entropy, for example), but progress is being made. You can read about it on the Alignment Forum tag, or watch Rob's videos (Avoiding Negative Side Effects) and (Avoiding Positive Side Effects)

Stamps: Aprillion, plex

Tags: None (add tags)

René's question on Reward Modeling

Very cool, which video goes further into 'How do you learn, without someone giving you feedback"?

The technical term for that is "Unsupervised Reinforcement Learning" if you want to look into it more deeply than Rob's videos cover. One good lecture style resource is
Rob has another video on a specific aspect of Unsupervised Reinforcement Learning and how it relates to AI Safety here:

Stamps: plex, Augustus Caesar

Tags: None (add tags)

If programmed with the wrong motivations, a machine could be malevolent toward humans, and intentionally exterminate our species. More likely, it could be designed with motivations that initially appeared safe (and easy to program) to its designers, but that turn out to be best fulfilled (given sufficient power) by reallocating resources from sustaining human life to other projects. As Yudkowsky writes, “the AI does not hate you, nor does it love you, but you are made out of atoms which it can use for something else.”

Since weak AIs with many different motivations could better achieve their goal by faking benevolence until they are powerful, safety testing to avoid this could be very challenging. Alternatively, competitive pressures, both economic and military, might lead AI designers to try to use other methods to control AIs with undesirable motivations. As those AIs became more sophisticated this could eventually lead to one risk too many.

Even a machine successfully designed with superficially benevolent motivations could easily go awry when it discovers implications of its decision criteria unanticipated by its designers. For example, a superintelligence programmed to maximize human happiness might find it easier to rewire human neurology so that humans are happiest when sitting quietly in jars than to build and maintain a utopian world that caters to the complex and nuanced whims of current human neurology.

See also:

A machine superintelligence, if programmed with the right motivations, could potentially solve all the problems that humans are trying to solve but haven’t had the ingenuity or processing speed to solve yet. A superintelligence might cure disabilities and diseases, achieve world peace, give humans vastly longer and healthier lives, eliminate food and energy shortages, boost scientific discovery and space exploration, and so on.

Furthermore, humanity faces several existential risks in the 21st century, including global nuclear war, bioweapons, superviruses, and more. A superintelligent machine would be more capable of solving those problems than humans are.

See also:

See more...

Guidelines & alternate sortings

How does nature handle this for humans and other animals?

Have we considered simply abolishing private property so nobody gets to own the AI that inevitably takes over the world?

did you just google 'the google' oh my god i love you

Actually, discussing goals brings up an interesting question in the ethics of AI design. If we're going to have all of these highly intelligent machines running around, is it ethical to give them goals exclusively corresponding to work given to them by humans? Is slavery still wrong if the slaves like it? If you assume that intelligence necessarily implies a consciousness (and, really, things become a bit arbitrary if you don't), do we have a responsibility to grant AIs individuality?

What do you think?

You know just as well as I do that the guy who collects stamps will not just buy some stamps, he will build The Stamp Collector, and you have just facilitated the end of all humanity :( I would like to ask, on a more serious note, do you have any insights on how this relates to how humans often feel a sense of emptiness after achieving all of their goals. Or, well, I fail to explain it correctly, but there is this idea that humans always need a new goal to feel happy right? Maybe I am completely off, but what I am asking is, yes in an intelligent agent we can have simple, or even really complex goals, but will it ever be able to mimic the way goals are present in humans, a goal that is not so much supposed to be achieved, but more a fuel to make progress, kind of maybe like: a desire?

See more...

Recent Changes - What's changed on the wiki recently.
Top answers - Answers with the highest stamp count.
Upcoming questions - The next questions Stampy will ask on Discord, and those he'll give if you ask him for a question.
Imported FAQs - Questions and Answers imported from external sites (with permission)