Review answers

From Stampy's Wiki
(Redirected from Review answers)

If you think an answer is good (i.e. accurate, helpful, and replies well to the question) then please give it a stamp of approval. If you're a more experienced editor and have a good feel for when there is consensus around an answer being accepted, feel free to mark answers as canonical, so that Stampy will start serving them to users.

Individual pages

These pages track 208 answers which need review (with some being counted multiple times).

Review answers

These 46 non-canonical answers are answering canonical questions.

No. Misaligned artificial intelligence poses a serious threat to the continued flourishing, and maybe even continued existence, of humanity as a whole. While predictions about when artificial general intelligence may be achieved vary, surveys consistently report a >50% probability of achieving general AI before the year 2060 - within the expected lifetimes of most people alive today.

It is difficult to predict how technology will develop, and at what speed, in the years ahead; but as artificial intelligence poses a not-insignificant chance of causing worldwide disaster within the not-too-distant future, anyone who is generally concerned with the future of humanity has reason to be interested.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: transhumanism (create tag) (edit tags)

There are numerous organizations working on AI alignment. A partial list includes:

For more information about the research happening at some of these organizations see a review (from 2021) here.

Since AI alignment is a growing field, new organizations are often created. Also, in addition to these organizations, there are a number of research groups at different universities whose research also focuses on AI alignment.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: organizations, alignment (create tag) (edit tags)

For more details, see arbital.

Pivotal acts are acts that substantially change the direction humanity will have taken in 1 billion years. The term is used to denote positive changes, as opposed to existential catastrophe.

An obvious pivotal act would be to create an AGI aligned with humanity's best interests. An act that would greatly increase the chance of another pivotal act would also count as pivotal.

Another, possibly more controversial act, would be an act that stops or strongly delays the development of an unaligned (or any) AGI through drastic means such as a virus that melts all advanced processors, or the disabling of all AI researchers. Eliezer mentions these here.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

... further results

These 52 answers have been added in the last month.

No. Misaligned artificial intelligence poses a serious threat to the continued flourishing, and maybe even continued existence, of humanity as a whole. While predictions about when artificial general intelligence may be achieved vary, surveys consistently report a >50% probability of achieving general AI before the year 2060 - within the expected lifetimes of most people alive today.

It is difficult to predict how technology will develop, and at what speed, in the years ahead; but as artificial intelligence poses a not-insignificant chance of causing worldwide disaster within the not-too-distant future, anyone who is generally concerned with the future of humanity has reason to be interested.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: transhumanism (create tag) (edit tags)

There are numerous organizations working on AI alignment. A partial list includes:

For more information about the research happening at some of these organizations see a review (from 2021) here.

Since AI alignment is a growing field, new organizations are often created. Also, in addition to these organizations, there are a number of research groups at different universities whose research also focuses on AI alignment.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: organizations, alignment (create tag) (edit tags)

For more details, see arbital.

Pivotal acts are acts that substantially change the direction humanity will have taken in 1 billion years. The term is used to denote positive changes, as opposed to existential catastrophe.

An obvious pivotal act would be to create an AGI aligned with humanity's best interests. An act that would greatly increase the chance of another pivotal act would also count as pivotal.

Another, possibly more controversial act, would be an act that stops or strongly delays the development of an unaligned (or any) AGI through drastic means such as a virus that melts all advanced processors, or the disabling of all AI researchers. Eliezer mentions these here.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

How can we achieve LLM alignment?


Anthropic fine tuned a language model to be more helpful, honest and harmless: HHH.

Motivation: I think the point of this is to:

  1. see if we can "align" a current day LLM, and
  2. raise awareness about safety in the broader ML community.

How can we interpret what all the neurons mean?


Chris Olah, the interpretability legend, is working on looking really hard at all the neurons to see what they all mean. The approach he pioneered is circuits: looking at computational subgraphs of the network, called circuits, and interpreting those. Idea: "decompiling the network into a better representation that is more interpretable". In-context learning via attention heads, and interpretability here seems useful.

One result I heard about recently: a linear softmax unit stretches space and encourages neuron monosemanticity (making a neuron represent only one thing, as opposed to firing on many unrelated concepts). This makes the network easier to interpret.

Motivation: The point of this is to get as many bits of information about what neural networks are doing, to hopefully find better abstractions. This diagram gets posted everywhere, the hope being that networks, in the current regime, will become more interpretable because they will start to use abstractions that are closer to human abstractions.

How do you figure out model performance scales?


The basic idea is to figure out how model performance scales, and use this to help understand and predict what future AI models might look like, which can inform timelines and AI safety research. A classic result found that you need to increase data, parameters, and compute all at the same time (at roughly the same rate) in order to improve performance. Anthropic extended this research here.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: None (add tags)

... further results

These 173 canonical answers have one or fewer stamps.

One possible way to ensure the safety of a powerful AI system is to keep it contained in a software environment. There is nothing intrinsically wrong with this procedure - keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.

Humans sometimes put dangerous humans inside boxes to limit their ability to influence the external world. Sometimes, these humans escape their boxes. The security of a prison depends on certain assumptions, which can be violated. Yoshie Shiratori reportedly escaped prison by weakening the door-frame with miso soup and dislocating his shoulders.

Human written software has a high defect rate; we should expect a perfectly secure system to be difficult to create. If humans construct a software system they think is secure, it is possible that the security relies on a false assumption. A powerful AI system could potentially learn how its hardware works and manipulate bits to send radio signals. It could fake a malfunction and attempt social engineering when the engineers look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.

Experimentally, humans have convinced other humans to let them out of the box. Spooky.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: boxing (edit tags)

In principle it could (if you believe in functionalism), but it probably won't. One way to ensure that AI has human-like emotions would be to copy the way human brain works, but that's not what most AI researchers are trying to do.

It's similar to how once some people thought we will build mechanical horses to pull our vehicles, but it turned out it's much easier to build a car. AI probably doesn't need emotions or maybe even consciousness to be powerful, and the first AGIs that will get built will be the ones that are easiest to build.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Alternate phrasings are used to improve the semantic search which Stampy uses to serve people questions, by giving alternate ways to say a question which might trigger a match when the main wording won't. They should generally only be used when there is a significantly different wording, rather than for only very minor changes.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!

Tags: stampy (edit tags)

A slow takeoff is where AI capabilities improve gradually, giving us plenty of time to adapt. In a moderate takeoff we might see accelerating progress, but we still won’t be caught off guard by a dramatic change. Whereas, in a fast or hard takeoff AI would go from being not very generally competent to sufficiently superhuman to control the future too fast for humans to course correct if something goes wrong.

The article Distinguishing definitions of takeoff goes into more detail on this.

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!


Intelligence is powerful. One might say that “Intelligence is no match for a gun, or for someone with lots of money,” but both guns and money were produced by intelligence. If not for our intelligence, humans would still be foraging the savannah for food.

Intelligence is what caused humans to dominate the planet in the blink of an eye (on evolutionary timescales). Intelligence is what allows us to eradicate diseases, and what gives us the potential to eradicate ourselves with nuclear war. Intelligence gives us superior strategic skills, superior social skills, superior economic productivity, and the power of invention.

A machine with superintelligence would be able to hack into vulnerable networks via the internet, commandeer those resources for additional computing power, take over mobile machines connected to networks connected to the internet, use them to build additional machines, perform scientific experiments to understand the world better than humans can, invent quantum computing and nanotechnology, manipulate the social world better than we can, and do whatever it can to give itself more power to achieve its goals — all at a speed much faster than humans can respond to.

See also

Stamps: None
Show your endorsement of this answer by giving it a stamp of approval!