stub

From Stampy's Wiki
Stub
stub

Canonically answered

Where can I learn about interpretability?

Show your endorsement of this answer by giving it a stamp of approval!

Christoph Molnar's online book and distill.pub are great sources, as well as this overview article which summarizes 70 interpretability papers.

What training programs and courses are available for AGI safety?

Show your endorsement of this answer by giving it a stamp of approval!
  • AGI safety fundamentals (technical and governance) - Is the canonical AGI safety 101 course. 3.5 hours reading, 1.5 hours talking a week w/ facilitator for 8 weeks.
  • Refine - A 3-month incubator for conceptual AI alignment research in London, hosted by Conjecture.
  • AI safety camp - Actually do some AI research. More about output than learning.
  • SERI ML Alignment Theory Scholars Program SERI MATS - Four weeks developing an understanding of a research agenda at the forefront of AI alignment through online readings and cohort discussions, averaging 10 h/week. After this initial upskilling period, the scholars will be paired with an established AI alignment researcher for a two-week ‘research sprint’ to test fit. Assuming all goes well, scholars will be accepted into an eight-week intensive scholars program in Berkeley, California.
  • Principles of Intelligent Behavior in Biological and Social Systems (PIBBSS) - Brings together young researchers studying complex and intelligent behavior in natural and social systems.
  • Safety and Control for Artificial General Intelligence - An actual AI Safety university course (UC Berkeley). Touches multiple domains including cognitive science, utility theory, cybersecurity, human-machine interaction, and political science.

See also, this spreadsheet of learning resources.

What is Conjecture's epistemology research agenda?

Show your endorsement of this answer by giving it a stamp of approval!

The alignment problem is really hard to do science on: we are trying to reason about the future, and we only get one shot, meaning that we can't iterate. Therefore, it seems really useful to have a good understanding of meta-science/epistemology, i.e. reasoning about ways to do useful alignment research.

Non-canonical answers

How long will it be until transformative AI is created?

Show your endorsement of this answer by giving it a stamp of approval!

There have been surveys and opinion polls done. The most comprehensive one was done by The Future of Humanity Institute, where they surveyed 550 of the top experts in AI research. In this survey, when asked "which year do you think the chance of human level artificial intelligence reaches 50%", the mean response was 2081 and the median response was 2040.

What are OpenAI Codex and GitHub Copilot?

Show your endorsement of this answer by giving it a stamp of approval!

Codex / Github Copilot are AIs that use GPT-3 to write and edit code. When given some input code and comments describing the intended function, they will write output that extends the prompt as accurately as possible.

How does the field of AI Safety want to accomplish its goal of preventing existential risk?

Show your endorsement of this answer by giving it a stamp of approval!

Alignment

Governance - e.g. By establishing best practises, institutions & processes, awareness, regulation, certification, etc?

What does Evan Hubinger think of Deception + Inner Alignment?

Show your endorsement of this answer by giving it a stamp of approval!

Read Evan's research agenda for more information.

It seems likely that deceptive agents are the default, so a key problem in alignment is to figure out how we can avoid deceptive alignment at every point in the training process. This seems to rely on being able to consistently exert optimization pressure against deception, which probably necessitates interpretability tools.

His plan to do this right now is acceptability verification: have some predicate that precludes deception, and then check your model for this predicate at every point in training.

One idea for this predicate is making sure that the agent is myopic, meaning that the AI only cares about the current timestep, so there is no incentive to deceive, because the benefits of deception happen only in the future. This is operationalized as “return the action that your model of HCH would return, if it received your inputs.”