Semantic search

From Stampy's Wiki

Science fiction author Isaac Asimov told stories about robots programmed with the Three Laws of Robotics: (1) a robot may not injure a human being or, through inaction, allow a human being to come to harm, (2) a robot must obey any orders given to it by human beings, except where such orders would conflict with the First Law, and (3) a robot must protect its own existence as long as such protection does not conflict with the First or Second Law. But Asimov’s stories tended to illustrate why such rules would go wrong.

Still, could we program ‘constraints’ into a superintelligence that would keep it from harming us? Probably not.

One approach would be to implement ‘constraints’ as rules or mechanisms that prevent a machine from taking actions that it would normally take to fulfill its goals: perhaps ‘filters’ that intercept and cancel harmful actions, or ‘censors’ that detect and suppress potentially harmful plans within a superintelligence.

Constraints of this kind, no matter how elaborate, are nearly certain to fail for a simple reason: they pit human design skills against superintelligence. A superintelligence would correctly see these constraints as obstacles to the achievement of its goals, and would do everything in its power to remove or circumvent them. Perhaps it would delete the section of its source code that contains the constraint. If we were to block this by adding another constraint, it could create new machines that don’t have the constraint written into them, or fool us into removing the constraints ourselves. Further constraints may seem impenetrable to humans, but would likely be defeated by a superintelligence. Counting on humans to out-think a superintelligence is not a viable solution.

If constraints on top of goals are not feasible, could we put constraints inside of goals? If a superintelligence had a goal of avoiding harm to humans, it would not be motivated to remove this constraint, avoiding the problem we pointed out above. Unfortunately, the intuitive notion of ‘harm’ is very difficult to specify in a way that doesn’t lead to very bad results when used by a superintelligence. If ‘harm’ is defined in terms of human pain, a superintelligence could rewire humans so that they don’t feel pain. If ‘harm’ is defined in terms of thwarting human desires, it could rewire human desires. And so on.

If, instead of trying to fully specify a term like ‘harm’, we decide to explicitly list all of the actions a superintelligence ought to avoid, we run into a related problem: human value is complex and subtle, and it’s unlikely we can come up with a list of all the things we don’t want a superintelligence to do. This would be like writing a recipe for a cake that reads: “Don’t use avocados. Don’t use a toaster. Don’t use vegetables…” and so on. Such a list can never be long enough.

Stamps: None


Except in the case of Whole Brain Emulation, there is no reason to expect a superintelligent machine to have motivations anything like those of humans. Human minds represent a tiny dot in the vast space of all possible mind designs, and very different kinds of minds are unlikely to share to complex motivations unique to humans and other mammals.

Whatever its goals, a superintelligence would tend to commandeer resources that can help it achieve its goals, including the energy and elements on which human life depends. It would not stop because of a concern for humans or other intelligences that is ‘built in’ to all possible mind designs. Rather, it would pursue its particular goal and give no thought to concerns that seem ‘natural’ to that particular species of primate called homo sapiens.

There are, however, some basic instrumental motivations we can expect superintelligent machines to display, because they are useful for achieving its goals, no matter what its goals are. For example, an AI will ‘want’ to self-improve, to be optimally rational, to retain its original goals, to acquire resources, and to protect itself — because all these things help it achieve the goals with which it was originally programmed.

See also:

Stamps: None


A Friendly Artificial Intelligence (Friendly AI or FAI) is an artificial intelligence that is ‘friendly’ to humanity — one that has a good rather than bad effect on humanity.

AI researchers continue to make progress with machines that make their own decisions, and there is a growing awareness that we need to design machines to act safely and ethically. This research program goes by many names: ‘machine ethics’, ‘machine morality’, ‘artificial morality’, ‘computational ethics’ and ‘computational metaethics’, ‘friendly AI’, and ‘robo-ethics’ or ‘robot ethics’.

The most immediate concern may be in battlefield robots; the U.S. Department of Defense contracted Ronald Arkin to design a system for ensuring ethical behavior in autonomous battlefield robots. The U.S. Congress has declared that a third of America’s ground systems must be robotic by 2025, and by 2030 the U.S. Air Force plans to have swarms of bird-sized flying robots that operate semi-autonomously for weeks at a time.

But Friendly AI research is not concerned with battlefield robots or machine ethics in general. It is concerned with a problem of a much larger scale: designing AI that would remain safe and friendly after the intelligence explosion.

A machine superintelligence would be enormously powerful. Successful implementation of Friendly AI could mean the difference between a solar system of unprecedented happiness and a solar system in which all available matter has been converted into parts for achieving the superintelligence’s goals.

It must be noted that Friendly AI is a harder project than often supposed. As explored below, commonly suggested solutions for Friendly AI are likely to fail because of two features possessed by any superintelligence:

  1. Superpower: a superintelligent machine will have unprecedented powers to reshape reality, and therefore will achieve its goals with highly efficient methods that confound human expectations and desires.
  2. Literalness: a superintelligent machine will make decisions based on the mechanisms it is designed with, not the hopes its designers had in mind when they programmed those mechanisms. It will act only on precise specifications of rules and values, and will do so in ways that need not respect the complexity and subtlety[41][42][43] of what humans value. A demand like “maximize human happiness” sounds simple to us because it contains few words, but philosophers and scientists have failed for centuries to explain exactly what this means, and certainly have not translated it into a form sufficiently rigorous for AI programmers to use.

See also:

Stamps: None

Tags: definitions, friendly ai (create tag) (edit tags)

The Stamp Points System

If someone posts something good - something that shows insight, knowledge of AI Safety, etc. - give the message a stamp :stamp: of approval! @Stampy keeps track of these, and uses them to decide how much he likes each user. You can ask Stampy (in a PM if you like), "How many stamps am I worth?", and he'll tell you. If something is really very good, especially if it took a lot of work/effort, give it a gold stamp :goldstamp:. These are worth 5 regular stamps!

Note that stamps aren't just 'likes', so please don't give stamps to say "me too" or "that's funny" etc. They're meant to represent knowledge, understanding, good judgement, and contributing to the discord. You can use :100: or :heavy_check_mark: for things you agree with, :laughing: or :rofl: for funny things etc.

Your stamp points determine your voting power for approving YouTube replies, and in future probably other things, like getting invite links to share etc.

Notes on stamps and stamp points

  • Stamps awarded by people with a lot of stamp points are worth more
  • Awarding people stamps does not reduce your stamp points
  • New users who have 0 stamp points can still award stamps, they just have no effect. But it's still worth doing because if you get stamp points later, all your previous votes are retroactively updated!
  • Yes, this was kind of tricky to implement! Stampy actually stores how many stamps each user has awarded to every other user, and uses that to build a system of linear scalar equations which is then solved with numpy.
  • When people post things that are insightful, that show good judgement or good knowledge of AI safety, you give them a :stamp:
  • Each user has stamp points, and also gives a score to every other user they give stamps to the scores sum to 1 so if I give user A a stamp, my score for them will be 1.0, if I then give user B a stamp, my score for A is 0.5 and B is 0.5, if I give another to B, my score for A goes to 0.3333 and B to 0.66666 and so on
  • Score is "what proportion of the stamps I've given have gone to this user"
  • Everyone's stamp points is the sum of (every other user's score for them, times that user's stamp points) so the way to get points is to get stamps from people who have points
  • Rob is the root of the tree, he got one point from stampy
  • So the idea is the stamp power kind of flows through the network, giving people points for posting things that I thought were good, or posting things that "people who posted things I thought were good" thought were good, and so on ad infinitum so for posting youtube comments, stampy won't send the comment until it has enough stamps of approval. Which could be a small number of high-points users or a larger number of lower-points users
  • Oh also, :goldstamp: is just equivalent to 5 stamps and stamps given to yourself or to stampy do nothing

So yeah everyone ends up with a number that basically represents what stampy thinks of them, and you can ask him "how many stamps am I worth?" to get that number

For technical details, see: https://discord.com/channels/677546901339504640/758062805810282526/781208566408413235

Stamps: None


Nick Bostrom defined ‘superintelligence’ as:

"an intellect that is much smarter than the best human brains in practically every field, including scientific creativity, general wisdom and social skills."

This definition includes vague terms like ‘much’ and ‘practically’, but it will serve as a working definition for superintelligence in this FAQ An intelligence explosion would lead to machine superintelligence, and some believe that an intelligence explosion is the most likely path to superintelligence.

See also:

Bostrom, Long Before Superintelligence? Legg, Machine Super Intelligence

Stamps: None