plex

From Stampy's Wiki
User:756254556811165756 / (Redirected from plex)
Questions Asked: 90
Answers written: 123

I'm the resident MediaWiki guy and general organizer of all things stampy. Talk to me if you have questions or need access to something.

Also, here's my conversation menu, listing many of the topics I like to talk about. Feel free to book a call to discuss any of them.

Also, I'm practicing as a life coach specializing in people who want to work on AI alignment. I've been working with sudonym and chriscanal for a few months now, and I'm ready to take on a few more people for free calls to discuss your goals and try to disentangle things holding you back from them. While I have spare capacity I can do weekly calls, but may have to move to less regular if other things in my life pick up or a lot of people sign up.

Questions by plex which have been answered

How can I collect questions for Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

As well as simply adding your own questions over at ask question, you could also message your friends with something like:

Hi,

I'm working on a project to create a comprehensive FAQ about AI alignment (you can read about it here https://stampy.ai/wiki/Stampy%27s_Wiki if interested). We're looking for questions and I thought you may have some good ones. If you'd be willing to write up a google doc with you top 5-10ish questions we'd be happy to write a personalized FAQ for you. https://stampy.ai/wiki/Scope explains the kinds of questions we're looking for.

Thanks!

and maybe bring the google doc to a Stampy editing session so we can collaborate on answering them or improving your answers to them.

How can I contact the Stampy team?

Show your endorsement of this answer by giving it a stamp of approval!

The Rob Miles AI Discord is the hub of all things Stampy. If you want to be part of the project and don't have access yet, ask plex#1874 on Discord (or plex on wiki).

You can also talk to us on the public Discord! Try #suggestions or #general, depending on what you want to talk about.

How can I contribute to Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

If you're not already there, join the Discord where the contributors hang out.

The main ways you can help are to answer questions or add questions, or help to review questions, review answers, or improve answers (instructions for helping out with each of these tasks are on the linked pages). You could also join the dev team if you have programming skills.

How can I join the Stampy dev team?

Show your endorsement of this answer by giving it a stamp of approval!

The development team works on multiple projects in support of Stampy. Currently, these projects include:

  • Stampy UI, which is made mostly in TypeScript.
  • The Stampy Wiki, which is made mostly in PHP and JavaScript.
  • The Stampy Bot, which is made in Python.

However, even if you don’t specialize in any of these areas, do reach out if you would like to help.

To join, please contact our Project Manager, plex. You can reach him on discord at plex#1874. He will be able to point your skills in the right direction to help in the most effective way possible.

How close do AI experts think we are to creating superintelligence?

Show your endorsement of this answer by giving it a stamp of approval!

Nobody knows for sure when we will have AGI, or if we’ll ever get there. Open Philanthropy CEO Holden Karnofsky has analyzed a selection of recent expert surveys on the matter, as well as taking into account findings of computational neuroscience, economic history, probabilistic methods and failures of previous AI timeline estimates. This all led him to estimate that "there is more than a 10% chance we'll see transformative AI within 15 years (by 2036); a ~50% chance we'll see it within 40 years (by 2060); and a ~2/3 chance we'll see it this century (by 2100)." Karnofsky bemoans the lack of robust expert consensus on the matter and invites rebuttals to his claims in order to further the conversation. He compares AI forecasting to election forecasting (as opposed to academic political science) or market forecasting (as opposed to theoretical academics), thereby arguing that AI researchers may not be the "experts” we should trust in predicting AI timelines.

Opinions proliferate, but given experts’ (and non-experts’) poor track record at predicting progress in AI, many researchers tend to be fairly agnostic about when superintelligent AI will be invented.

UC-Berkeley AI professor Stuart Russell has given his best guess as “sometime in our children’s lifetimes”, while Ray Kurzweil (Google’s Director of Engineering) predicts human level AI by 2029 and an intelligence explosion by 2045. Eliezer Yudkowsky expects the end of the world, and Elon Musk expects AGI, before 2030.

If there’s anything like a consensus answer at this stage, it would be something like: “highly uncertain, maybe not for over a hundred years, maybe in less than fifteen, with around the middle of the century looking fairly plausible”.

How do I add content from LessWrong / Effective Altruism Forum tag-wikis to Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

You can include a live-updating version of many definitions from LW using the syntax on Template:TagDesc in the Answer field and Template:TagDescBrief on the Brief Answer field. Similarly, calling Template:TagDescEAF and Template:TagDescEAFBrief will pull from the EAF tag wiki.

When available this should be used as it reduces the duplication of effort and directs all editors to improving a single high quality source.

How do I format answers on Stampy?

Show your endorsement of this answer by giving it a stamp of approval!
Stampy uses MediaWiki markup, which includes a limited subset of HTML plus the following formatting options:

Items on lists start with *, numbered lists with #

  • For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
  • For internal links write the page title wrapped in [[]]s
    • e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
  • (ref)Reference notes go inside these tags(/ref)[1]
  • If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
  • To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
    • Start with ** or ## for double indentation
  • Three 's around text - Bold
  • Two 's around text Italic - Italic
Click show detailed for extra options and advanced usage.
  1. Note that we use ()s rather than the standard <>s for compatibility with Semantic MediaWiki. The references are automatically added to the bottom of the answer!
  2. If images seem popular we'll set up local uploads.
Stampy uses MediaWiki markup, which includes a limited subset of HTML plus the following formatting options:

Items on lists start with *, numbered lists with #

  • For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
  • For internal links write the page title wrapped in [[]]s
    • e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
  • (ref)Reference notes go inside these tags(/ref)[1]
  • If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
    I3ylPvE.png
  • To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
    • Start with ** or ## for double indentation
  • Three 's around text - Bold
  • Two 's around text Italic - Italic

Headings

have ==heading here== around them, more =s for smaller headings.

Wrap quotes in < blockquote>< /blockquote> tags (without the spaces)

There are also (poem) (/poem) to suppress linebreak removal, (pre) (/pre) for preformatted text, and (nowiki) (/nowiki) to not have that content parsed.[3]

We can pull live descriptions from the LessWrong/Alignment Forum using their identifier fro the URL, for example including the formatting on Template:TagDesc with orthogonality-thesis as a parameter will render as the full tag description from the LessWrong tag wiki entry on Orthogonality Thesis. Template:TagDescBrief is similar but will pull only the first paragraph without formatting.

For tables please use HTML tables rather than wikicode tables.

Edit this page to see examples.
  1. Note that we use ()s rather than the standard <>s for compatibility with Semantic MediaWiki. The references are automatically added to the bottom of the answer!
  2. If images seem popular we'll set up local uploads.
  3. () can also be used in place of allowed HTML tags. You can escape a () tag by placing a ! inside the start of the first entry. Be aware that () tags only nest up to two layers deep!

How does the stamp eigenkarma system work?

Show your endorsement of this answer by giving it a stamp of approval!

If someone posts something good - something that shows insight, knowledge of AI Safety, etc. - give the message or answer a stamp of approval! Stampy keeps track of these, and uses them to decide how much he likes each user. You can ask Stampy (in a PM if you like), "How many stamps am I worth?", and he'll tell you.

If something is really very good, especially if it took a lot of work/effort, give it a gold stamp. These are worth 5 regular stamps!

Note that stamps aren't just 'likes', so please don't give stamps to say "me too" or "that's funny" etc. They're meant to represent knowledge, understanding, good judgement, and contributing to the discord. You can use 💯 or ✔️ for things you agree with, 😂 or 🤣 for funny things etc.

Your stamp points determine how much say you have if there are disagreements on Stampy content, which channels you have permission to post to, your voting power for approving YouTube replies, and whether you get to invite people.

Notes on stamps and stamp points

  • Stamps awarded by people with a lot of stamp points are worth more
  • Awarding people stamps does not reduce your stamp points
  • New users who have 0 stamp points can still award stamps, they just have no effect. But it's still worth doing because if you get stamp points later, all your previous votes are retroactively updated!
  • Yes, this was kind of tricky to implement! Stampy actually stores how many stamps each user has awarded to every other user, and uses that to build a system of linear scalar equations which is then solved with numpy.
  • Each user has stamp points, and also gives a score to every other user they give stamps to the scores sum to 1 so if I give user A a stamp, my score for them will be 1.0, if I then give user B a stamp, my score for A is 0.5 and B is 0.5, if I give another to B, my score for A goes to 0.3333 and B to 0.66666 and so on
  • Score is "what proportion of the stamps I've given have gone to this user"
  • Everyone's stamp points is the sum of (every other user's score for them, times that user's stamp points) so the way to get points is to get stamps from people who have points
  • Rob is the root of the tree, he got one point from Stampy
  • So the idea is the stamp power kind of flows through the network, giving people points for posting things that I thought were good, or posting things that "people who posted things I thought were good" thought were good, and so on ad infinitum so for posting YouTube comments, Stampy won't send the comment until it has enough stamps of approval. Which could be a small number of high-points users or a larger number of lower-points users
  • Stamps given to yourself or to stampy do nothing

So yeah everyone ends up with a number that basically represents what Stampy thinks of them, and you can ask him "how many stamps am I worth?" to get that number

so if you have people a, b, and c, the points are calculated by:
a_points = (bs_score_for_a * b_points) + (cs_score_for_a * c_points)
b_points = (as_score_for_b * a_points) + (cs_score_for_b * c_points)
c_points = (as_score_for_c * a_points) + (bs_score_for_c * b_points)
which is tough because you need to know everyone else's score before you can calculate your own
but actually the system will have a fixed point - there'll be a certain arrangement of values such that every node has as much flowing out as flowing in - a stable configuration so you can rearrange
(bs_score_for_a * b_points) + (cs_score_for_a * c_points) - a_points = 0
(as_score_for_b * a_points) + (cs_score_for_b * c_points) - b_points = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) - c_points = 0
or, for neatness:
( -1 * a_points) + (bs_score_for_a * b_points) + (cs_score_for_a * c_points) = 0
(as_score_for_b * a_points) + ( -1 * b_points) + (cs_score_for_b * c_points) = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) + ( -1 * c_points) = 0
and this is just a system of linear scalar equations that you can throw at numpy.linalg.solve
(you add one more equation that says rob_points = 1, so there's some place to start from) there should be one possible distribution of points such that all of the equations hold at the same time, and numpy finds that by linear algebra magic beyond my very limited understanding
but as far as I can tell you can have all the cycles you want!
(I actually have the scores sum to slightly less than 1, to have the stamp power slightly fade out as it propagates, just to make sure it doesn't explode. But I don't think I actually need to do that)
and yes this means that any time anyone gives a stamp to anyone, ~everyone's points will change slightly
And yes this means I'm recalculating the matrix and re-solving it for every new stamp, but computers are fast and I'm sure there are cheaper approximations I could switch to later if necessary

How might we get from Artificial General Intelligence to a Superintelligent system?

Show your endorsement of this answer by giving it a stamp of approval!

Once a system is at least as capable as top human at AI research, it would tend to become the driver of its own development and initiate a process of recursive self-improvement known as the intelligence explosion, leading to an extremely powerful system. A general framing of this process is Open Philanthropy's Process for Automating Scientific and Technological Advancement (PASTA).

There is much debate about whether there would be a notable period where the AI was partially driving its own development, with humans being gradually less and less important, or whether the transition to AI automated AI capability research would be sudden. However, the core idea that there is some threshold of capabilities beyond which a system would begin to rapidly ascend is hard to reasonably dispute, and is a significant consideration for developing alignment strategies.

I want to work on AI alignment. How can I get funding?

Show your endorsement of this answer by giving it a stamp of approval!
See the Future Funding List for up to date information!
See the Future Funding List for up to date information!

The organizations which most regularly give grants to individuals working towards AI alignment are the Long Term Future Fund, Survival And Flourishing (SAF), the OpenPhil AI Fellowship and early career funding, the Future of Life Institute, the Future of Humanity Institute, and the Center on Long-Term Risk Fund. If you're able to relocate to the UK, CEEALAR (aka the EA Hotel) can be a great option as it offers free food and accommodation for up to two years, as well as contact with others who are thinking about these issues. The FTX Future Fund only accepts direct applications for $100k+ with an emphasis on massively scaleable interventions, but their regranters can make smaller grants for individuals. There are also opportunities from smaller grantmakers which you might be able to pick up if you get involved.

If you want to work on support or infrastructure rather than directly on research, the EA Infrastructure Fund may be able to help. In general, you can talk to EA funds before applying.

Each grant source has their own criteria for funding, but in general they are looking for candidates who have evidence that they're keen and able to do good work towards reducing existential risk (for example, by completing an AI Safety Camp project), though the EA Hotel in particular has less stringent requirements as they're able to support people at very low cost. If you'd like to talk to someone who can offer advice on applying for funding, AI Safety Support offers free calls.

Another option is to get hired by an organization which works on AI alignment, see the follow-up question for advice on that.

It's also worth checking the AI Alignment tag on the EA funding sources website for up-to-date suggestions.

If AI takes over the world how could it create and maintain the infrastructure that humans currently provide?

Show your endorsement of this answer by giving it a stamp of approval!

An unaligned AI would not eliminate humans until it had replacements for the manual labor they provide to maintain civilization (e.g. a more advanced version of Tesla's Optimus). Until that point, it might settle for technologically and socially manipulating humans.

If we solve alignment, are we sure of a good future?

Show your endorsement of this answer by giving it a stamp of approval!

If by “solve alignment” you mean build a sufficiently performance-competitive superintelligence which has the goal of Coherent Extrapolated Volition or something else which captures human values, then yes. It would be able to deploy technology near the limits of physics (e.g. atomically precise manufacturing) to solve most of the other problems which face us, and steer the future towards a highly positive path for perhaps many billions of years until the heat death of the universe (barring more esoteric x-risks like encounters with advanced hostile civilizations, false vacuum decay, or simulation shutdown).

However, if you only have alignment of a superintelligence to a single human you still have the risk of misuse, so this should be at most a short-term solution. For example, what if Google creates a superintelligent AI, and it listens to the CEO of Google, and it’s programmed to do everything exactly the way the CEO of Google would want? Even assuming that the CEO of Google has no hidden unconscious desires affecting the AI in unpredictable ways, this gives one person a lot of power.

I’d like to get deeper into the AI alignment literature. Where should I look?

Show your endorsement of this answer by giving it a stamp of approval!

The AGI Safety Fundamentals Course is a arguably the best way to get up to speed on alignment, you can sign up to go through it with many other people studying and mentorship or read their materials independently.

Other great ways to explore include:

You might also want to consider reading Rationality: A-Z which covers a lot of skills that are valuable to acquire for people trying to think about large and complex issues, with The Rationalist's Guide to the Galaxy available as a shorter and more accessible AI-focused option.

OK, I’m convinced. How can I help?

Show your endorsement of this answer by giving it a stamp of approval!

Great! I’ll ask you a few follow-up questions to help figure out how you can best contribute, give you some advice, and link you to resources which should help you on whichever path you choose. Feel free to scroll up and explore multiple branches of the FAQ if you want answers to more than one of the questions offered :)

Note: We’re still building out and improving this tree of questions and answers, any feedback is appreciated.

At what level of involvement were you thinking of helping?

Please view and suggest to this google doc for improvements: https://docs.google.com/document/d/1S-CUcoX63uiFdW-GIFC8wJyVwo4VIl60IJHodcRfXJA/edit#

What approaches are AI alignment organizations working on?

Show your endorsement of this answer by giving it a stamp of approval!

Each major organization has a different approach. The research agendas are detailed and complex (see also AI Watch). Getting more brains working on any of them (and more money to fund them) may pay off in a big way, but it’s very hard to be confident which (if any) of them will actually work.

The following is a massive oversimplification, each organization actually pursues many different avenues of research, read the 2020 AI Alignment Literature Review and Charity Comparison for much more detail. That being said:

  • The Machine Intelligence Research Institute focuses on foundational mathematical research to understand reliable reasoning, which they think is necessary to provide anything like an assurance that a seed AI built will do good things if activated.
  • The Center for Human-Compatible AI focuses on Cooperative Inverse Reinforcement Learning and Assistance Games, a new paradigm for AI where they try to optimize for doing the kinds of things humans want rather than for a pre-specified utility function
  • Paul Christano's Alignment Research Center focuses is on prosaic alignment, particularly on creating tools that empower humans to understand and guide systems much smarter than ourselves. His methodology is explained on his blog.
  • The Future of Humanity Institute does work on crucial considerations and other x-risks, as well as AI safety research and outreach.
  • Anthropic is a new organization exploring natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
  • OpenAI is in a state of flux after major changes to their safety team.
  • DeepMind’s safety team is working on various approaches designed to work with modern machine learning, and does some communication via the Alignment Newsletter.
  • EleutherAI is a Machine Learning collective aiming to build large open source language models to allow more alignment research to take place.
  • Ought is a research lab that develops mechanisms for delegating open-ended thinking to advanced machine learning systems.

There are many other projects around AI Safety, such as the Windfall clause, Rob Miles’s YouTube channel, AI Safety Support, etc.

What are alternate phrasings for?

Show your endorsement of this answer by giving it a stamp of approval!

Alternate phrasings are used to improve the semantic search which Stampy uses to serve people questions, by giving alternate ways to say a question which might trigger a match when the main wording won't. They should generally only be used when there is a significantly different wording, rather than for only very minor changes.

What are language models?

Show your endorsement of this answer by giving it a stamp of approval!

Language Models are a class of AI trained on text, usually to predict the next word or a word which has been obscured. They have the ability to generate novel prose or code based on an initial prompt, which gives rise to a kind of natural language programming called prompt engineering. The most popular architecture for very large language models is called a transformer, which follows consistent scaling laws with respect to the size of the model being trained, meaning that a larger model trained with the same amount of compute will produce results which are better by a predictable amount (when measured by the 'perplexity', or how surprised the AI is by a test set of human-generated text).

See also

  • GPT - A family of large language models created by OpenAI

What are mesa-optimizers?

Show your endorsement of this answer by giving it a stamp of approval!
Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer. In this situation, a base optimizer creates a second optimizer, called a mesa-optimizer. The primary reference work for this concept is Hubinger et al.'s "Risks from Learned Optimization in Advanced Machine Learning Systems".

Mesa-Optimization is the situation that occurs when a learned model (such as a neural network) is itself an optimizer. In this situation, a base optimizer creates a second optimizer, called a mesa-optimizer. The primary reference work for this concept is Hubinger et al.'s "Risks from Learned Optimization in Advanced Machine Learning Systems".

Example: Natural selection is an optimization process that optimizes for reproductive fitness. Natural selection produced humans, who are themselves optimizers. Humans are therefore mesa-optimizers of natural selection.

In the context of AI alignment, the concern is that a base optimizer (e.g., a gradient descent process) may produce a learned model that is itself an optimizer, and that has unexpected and undesirable properties. Even if the gradient descent process is in some sense "trying" to do exactly what human developers want, the resultant mesa-optimizer will not typically be trying to do the exact same thing.1

 

History

Previously work under this concept was called Inner Optimizer or Optimization Daemons.

Wei Dai brings up a similar idea in an SL4 thread.2

The optimization daemons article on Arbital was published probably in 2016.3

Jessica Taylor wrote two posts about daemons while at MIRI:

 

See also

 

References

  1. "Optimization daemons". Arbital.
  2. Wei Dai. '"friendly" humans?' December 31, 2003.

 

External links

Video by Robert Miles

Some posts that reference optimization daemons:

  • "Cause prioritization for downside-focused value systems": "Alternatively, perhaps goal preservation becomes more difficult the more capable AI systems become, in which case the future might be controlled by unstable goal functions taking turns over the steering wheel"
  • "Techniques for optimizing worst-case performance": "The difficulty of optimizing worst-case performance is one of the most likely reasons that I think prosaic AI alignment might turn out to be impossible (if combined with an unlucky empirical situation)." (the phrase "unlucky empirical situation" links to the optimization daemons page on Arbital)

What are some good books about AGI safety?

Show your endorsement of this answer by giving it a stamp of approval!

The Alignment Problem (2020) by Brian Christian is the most recent in-depth guide to the field.

The book which first made the case to the public is Nick Bostrom’s Superintelligence (2014). It gives an excellent overview of the state of the field (as it was then) and makes a strong case for the subject being important, as well as exploring many fascinating adjacent topics. However, it does not cover newer developments, such as mesa-optimizers or language models.

There's also Human Compatible (2019) by Stuart Russell, which gives a more up-to-date review of developments, with an emphasis on the approaches that the Center for Human-Compatible AI are working on, such as cooperative inverse reinforcement learning. There's a good review/summary on SlateStarCodex.

Although not limited to AI safety, The AI Does Not Hate You (2020) is an entertaining and accessible outline of both the core issues and an exploration of some of the community and culture of the people working on it.

Various other books explore the issues in an informed way, such as Toby Ord’s The Precipice (2020), Max Tegmark’s Life 3.0 (2017), Yuval Noah Harari’s Homo Deus (2016), Stuart Armstrong’s Smarter Than Us (2014), and Luke Muehlhauser’s Facing the Intelligence Explosion (2013).

What are some good resources on AI alignment?

Show your endorsement of this answer by giving it a stamp of approval!

What are some of the most impressive recent advances in AI capabilities?

Show your endorsement of this answer by giving it a stamp of approval!

GPT-3 showed that transformers are capable of a vast array of natural language tasks, codex/copilot extended this into programming. One demonstrations of GPT-3 is Simulated Elon Musk lives in a simulation. Important to note that there are several much better language models, but they are not publicly available.

DALL-E and DALL-E 2 are among the most visually spectacular.

MuZero, which learned Go, Chess, and many Atari games without any directly coded info about those environments. The graphic there explains it, this seems crucial for being able to do RL in novel environments. We have systems which we can drop into a wide variety of games and they just learn how to play. The same algorithm was used in Tesla's self-driving cars to do complex route finding. These things are general.

Generally capable agents emerge from open-ended play - Diverse procedurally generated environments provide vast amounts of training data for AIs to learn generally applicable skills. Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning shows how these kind of systems can be trained to follow instructions in natural language.

GATO shows you can distill 600+ individually trained tasks into one network, so we're not limited by the tasks being fragmented.

What are some specific open tasks on Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

Other than the usual fare of writing and processing and organizing questions and answers, here are some specific open tasks:

What are the ethical challenges related to whole brain emulation?

Show your endorsement of this answer by giving it a stamp of approval!

Unless there was a way to cryptographically ensure otherwise, whoever runs the emulation has basically perfect control over their environment and can reset them to any state they were previously in. This opens up the possibility of powerful interrogation and torture of digital people.

Imperfect uploading might lead to damage that causes the EM to suffer while still remaining useful enough to be run for example as a test subject for research. We would also have greater ability to modify digital brains. Edits done for research or economic purposes might cause suffering. See this fictional piece for an exploration of how a world with a lot of EM suffering might look like.

These problems are exacerbated by the likely outcome that digital people can be run much faster than biological humans, so it would be plausibly possible to have an EM run for hundreds of subjective years in minutes or hours without having checks on the wellbeing of the EM in question.

What are the style guidelines for writing for Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

Avoid directly responding to the question in the answer, repeat the relevant part of the question instead. For example, if the question is "Can we do X", answer "We might be able to do X, if we can do Y", not "Yes, if we can manage Y". This way, the answer will also work for the questions "Why can't we do X" and "What would happen if we tried to do X".

Linking to external sites is strongly encouraged, one of the most valuable things Stampy can do is help people find other parts of the alignment information ecosystem.

Consider enclosing newly introduced terms, likely to be unfamiliar to many readers, in speech marks. If unsure, Google the term (in speech marks!) and see if it shows up anywhere other than LessWrong, the Alignment Forum, etc. Be judicious, as it's easy to use too many, but used carefully they can psychologically cushion newbies from a lot of unfamiliar terminology - in this context they're saying something like "we get that we're hitting you with a lot of new vocab, and you might not know what this term means yet".

When selecting related questions, there shouldn't be more than four unless there's a really good reason for that (some questions are asking for it, like the "Why can't we just..." question). It's also recommended to include at least one more "enticing" question to draw users in (relating to the more sensational, sci-fi, philosophical/ethical side of things) alongside more bland/neutral questions.

What harm could a single superintelligence do when it took so many humans to build civilization?

Show your endorsement of this answer by giving it a stamp of approval!

Superintelligence has an advantage that an early human didn’t – the entire context of human civilization and technology, there for it to manipulate socially or technologically.

What is "narrow AI"?

Show your endorsement of this answer by giving it a stamp of approval!

A Narrow AI is capable of operating only in a relatively limited domain, such as chess or driving, rather than capable of learning a broad range of tasks like a human or an Artificial General Intelligence. Narrow vs General is not a perfectly binary classification, there are degrees of generality with, for example, large language models having a fairly large degree of generality (as the domain of text is large) without being as general as a human, and we may eventually build systems that are significantly more general than humans.

What is "transformative AI"?

Show your endorsement of this answer by giving it a stamp of approval!

Transformative AI is "[...] AI that precipitates a transition comparable to (or more significant than) the agricultural or industrial revolution."[1] The concept refers to the large effects of AI systems on our well-being, the global economy, state power, international security, etc. and not to specific capabilities that AI might have (unlike the related terms Superintelligent AI and Artificial General Intelligence).

Holden Karnofsky gives a more detailed definition in another OpenPhil 2016 post:

[...] Transformative AI is anything that fits one or more of the following descriptions (emphasis original):

  • AI systems capable of fulfilling all the necessary functions of human scientists, unaided by humans, in developing another technology (or set of technologies) that ultimately becomes widely credited with being the most significant driver of a transition comparable to (or more significant than) the agricultural or industrial revolution. Note that just because AI systems could accomplish such a thing unaided by humans doesn’t mean they would; it’s possible that human scientists would provide an important complement to such systems, and could make even faster progress working in tandem than such systems could achieve unaided. I emphasize the hypothetical possibility of AI systems conducting substantial unaided research to draw a clear distinction from the types of AI systems that exist today. I believe that AI systems capable of such broad contributions to the relevant research would likely dramatically accelerate it.
  • AI systems capable of performing tasks that currently (in 2016) account for the majority of full-time jobs worldwide, and/or over 50% of total world wages, unaided and for costs in the same range as what it would cost to employ humans. Aside from the fact that this would likely be sufficient for a major economic transformation relative to today, I also think that an AI with such broad abilities would likely be able to far surpass human abilities in a subset of domains, making it likely to meet one or more of the other criteria laid out here.
  • Surveillance, autonomous weapons, or other AI-centric technology that becomes sufficiently advanced to be the most significant driver of a transition comparable to (or more significant than) the agricultural or industrial revolution. (This contrasts with the first point because it refers to transformative technology that is itself AI-centric, whereas the first point refers to AI used to speed research on some other transformative technology.)

What is AI Safety via Debate?

Show your endorsement of this answer by giving it a stamp of approval!

Debate is a proposed technique for allowing human evaluators to get correct and helpful answers from experts, even if the evaluator is not themselves an expert or able to fully verify the answers.[1] The technique was suggested as part of an approach to build advanced AI systems that are aligned with human values, and to safely apply machine learning techniques to problems that have high stakes, but are not well-defined (such as advancing science or increase a company's revenue). [2][3]

What is GPT-3?

Show your endorsement of this answer by giving it a stamp of approval!

GPT-3 is the newest and most impressive of the GPT (Generative Pretrained Transformer) series of large transformer-based language models created by OpenAI. It was announced in June 2020, and is 100 times larger than its predecessor GPT-2.[1]

Gwern has several resources exploring GPT-3's abilities, limitations, and implications including:

Vox has an article which explains why GPT-3 is a big deal.

  1. GPT-3: What’s it good for? - Cambridge University Press

What is Goodhart's law?

Show your endorsement of this answer by giving it a stamp of approval!
Goodhart's Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart's Law states that when a proxy for some value becomes the target of optimization pressure, the proxy will cease to be a good proxy. One form of Goodhart is demonstrated by the Soviet story of a factory graded on how many shoes they produced (a good proxy for productivity) – they soon began producing a higher number of tiny shoes. Useless, but the numbers look good.

Goodhart's Law is of particular relevance to AI Alignment. Suppose you have something which is generally a good proxy for "the stuff that humans care about", it would be dangerous to have a powerful AI optimize for the proxy, in accordance with Goodhart's law, the proxy will breakdown.  

Goodhart Taxonomy

In Goodhart Taxonomy, Scott Garrabrant identifies four kinds of Goodharting:

  • Regressional Goodhart - When selecting for a proxy measure, you select not only for the true goal, but also for the difference between the proxy and the goal.
  • Causal Goodhart - When there is a non-causal correlation between the proxy and the goal, intervening on the proxy may fail to intervene on the goal.
  • Extremal Goodhart - Worlds in which the proxy takes an extreme value may be very different from the ordinary worlds in which the correlation between the proxy and the goal was observed.
  • Adversarial Goodhart - When you optimize for a proxy, you provide an incentive for adversaries to correlate their goal with your proxy, thus destroying the correlation with your goal.

See Also

What is Stampy's copyright?

Show your endorsement of this answer by giving it a stamp of approval!
  • All content produced on this wiki is released under the CC-BY-SA 4.0 license. Exceptions for unattributed use may be granted by admins, contact plex for inquiries.
  • Questions from YouTube or other sources are reproduced with the intent of fair use, as derivative and educational material.
  • Source code of https://ui.stampy.ai/ is released under MIT license
  • Logo and visual design copyright is owned by Rob Miles, all rights reserved.

What is a "quantilizer"?

Show your endorsement of this answer by giving it a stamp of approval!
A Quantilizer is a proposed AI design which aims to reduce the harms from Goodhart's law and specification gaming by selecting reasonably effective actions from a distribution of human-like actions, rather than maximizing over actions. It it more of a theoretical tool for exploring ways around these problems than a practical buildable design.

A Quantilizer is a proposed AI design which aims to reduce the harms from Goodhart's law and specification gaming by selecting reasonably effective actions from a distribution of human-like actions, rather than maximizing over actions. It it more of a theoretical tool for exploring ways around these problems than a practical buildable design.

See also

What is a canonical question on Stampy's Wiki?

Show your endorsement of this answer by giving it a stamp of approval!

Canonical questions are the questions which we've checked are in scope and not duplicates, so we want answers to them. They may be edited to represent a class of question more broadly, rather than keeping all their idosyncracies. Once they're answered canonically Stampy will serve them to readers.

What is a duplicate question on Stampy's Wiki?

Show your endorsement of this answer by giving it a stamp of approval!

An existing question is a duplicate of a new one if it is reasonable to expect whoever asked the new question to be satisfied if they received an answer to the existing question instead.

What is a follow-up question on Stampy's Wiki?

Show your endorsement of this answer by giving it a stamp of approval!

Follow-up questions are responses to an answer which reader might have, either because they want more information or are providing information to Stampy about what they're looking for. We don't expect to have great coverage of the former for a long time because there will be so many, but hopefully we'll be able to handle some of the most common ones.

What is artificial general intelligence safety / AI alignment?

Show your endorsement of this answer by giving it a stamp of approval!

AI alignment is a field that is focused on causing the goals of future superintelligent artificial systems to align with human values, meaning that they would behave in a way which was compatible with our survival and flourishing. This may be an extremely hard problem, especially with deep learning, and is likely to determine the outcome of the most important century. Alignment research is strongly interdisciplinary and can include computer science, mathematics, neuroscience, philosophy, and social sciences.

AGI safety is a related concept which strongly overlaps with AI alignment. AGI safety is concerned with making sure that building AGI systems doesn’t cause things to go badly wrong, and the main way in which things can go badly wrong is through misalignment. AGI safety includes policy work that prevents the building of dangerous AGI systems, or reduces misuse risks from AGI systems aligned to actors who don’t have humanity’s best interests in mind.

What is meant by "AI takeoff"?

Show your endorsement of this answer by giving it a stamp of approval!
AI Takeoff refers to the process of an Artificial General Intelligence going from a certain threshold of capability (often discussed as "human-level") to being super-intelligent and capable enough to control the fate of civilization. There has been much debate about whether AI takeoff is more likely to be slow vs fast, i.e., "soft" vs "hard".

AI Takeoff refers to the process of an Artificial General Intelligence going from a certain threshold of capability (often discussed as "human-level") to being super-intelligent and capable enough to control the fate of civilization. There has been much debate about whether AI takeoff is more likely to be slow vs fast, i.e., "soft" vs "hard".

See also: AI Timelines, Seed AI, Singularity, Intelligence explosion, Recursive self-improvement

AI takeoff is sometimes casually referred to as AI FOOM.

Soft takeoff

A soft takeoff refers to an AGI that would self-improve over a period of years or decades. This could be due to either the learning algorithm being too demanding for the hardware or because the AI relies on experiencing feedback from the real-world that would have to be played out in real-time. Possible methods that could deliver a soft takeoff, by slowly building on human-level intelligence, are Whole brain emulation, Biological Cognitive Enhancement, and software-based strong AGI [1]. By maintaining control of the AGI's ascent it should be easier for a Friendly AI to emerge.

Vernor Vinge, Hans Moravec and have all expressed the view that soft takeoff is preferable to a hard takeoff as it would be both safer and easier to engineer.

Hard takeoff

A hard takeoff (or an AI going "FOOM" [2]) refers to AGI expansion in a matter of minutes, days, or months. It is a fast, abruptly, local increase in capability. This scenario is widely considered much more precarious, as this involves an AGI rapidly ascending in power without human control. This may result in unexpected or undesired behavior (i.e. Unfriendly AI). It is one of the main ideas supporting the Intelligence explosion hypothesis.

The feasibility of hard takeoff has been addressed by Hugo de Garis, Eliezer Yudkowsky, Ben Goertzel, Nick Bostrom, and Michael Anissimov. It is widely agreed that a hard takeoff is something to be avoided due to the risks. Yudkowsky points out several possibilities that would make a hard takeoff more likely than a soft takeoff such as the existence of large resources overhangs or the fact that small improvements seem to have a large impact in a mind's general intelligence (i.e.: the small genetic difference between humans and chimps lead to huge increases in capability) [3].

Notable posts

External links

References

  1. http://www.aleph.se/andart/archives/2010/10/why_early_singularities_are_softer.html
  2. http://lesswrong.com/lw/63t/requirements_for_ai_to_go_foom/
  3. http://lesswrong.com/lw/wf/hard_takeoff/

What is the Stampy project?

Show your endorsement of this answer by giving it a stamp of approval!
Stampy is open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing.

We're also building a cleaner web UI for readers and a bot interface.
The Stampy project is open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing.

We're also building a cleaner web UI for readers and a bot interface.

The goals of the project are to:

  • Offer a one-stop-shop for high-quality answers to common questions about AI alignment.
    • Let people answer questions in a way which scales, freeing up researcher time while allowing more people to learn from a reliable source.
    • Make external resources more easy to find by having links to them connected to a search engine which gets smarter the more it's used.
  • Provide a form of legitimate peripheral participation for the AI Safety community, as an on-boarding path with a flexible level of commitment.
    • Encourage people to think, read, and talk about AI alignment while answering questions, creating a community of co-learners who can give each other feedback and social reinforcement.
    • Provide a way for budding researchers to prove their understanding of the topic and ability to produce good work.
  • Collect data about the kinds of questions people actually ask and how they respond, so we can better focus resources on answering them.
If you would like to help out, join us on the Discord and either jump right into editing or read get involved for answers to common questions.

What kind of questions do we want on Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

Stampy is focused specifically on AI existential safety (both introductory and technical questions), but does not aim to cover general AI questions or other topics which don't interact strongly with the effects of AI on humanity's long-term future. More technical questions are also in our scope, though replying to all possible proposals is not feasible and this is not a place to submit detailed ideas for evaluation.

We are interested in:

  • Introductory questions closely related to the field e.g.
    • "How long will it be until transformative AI arrives?"
    • "Why might advanced AI harm humans?"
  • Technical questions related to the field e.g.
    • "What is Cooperative Inverse Reinforcement Learning?"
    • "What is Logical Induction useful for?"
  • Questions about how to contribute to the field e.g.
    • "Should I get a PhD?"
    • "Where can I find relevant job opportunities?"

More good examples can be found at canonical questions.

We do not aim to cover:

  • Aspects of AI Safety or fairness which are not strongly relevant to existential safety e.g.
    • "How should self-driving cars weigh up moral dilemmas"
    • "How can we minimize the risk of privacy problems caused by machine learning algorithms?"
  • Extremely specific and detailed questions the answering of which is unlikely to be of value to more than a single person e.g.
    • "What if we did <multiple paragraphs of dense text>? Would that result in safe AI?"

We will generally not delete out-of-scope content, but it will be reviewed as low priority to answer, not be marked as a canonical question, and not be served to readers by on Stampy's UI.

What should be marked as a canonical answer on Stampy's Wiki?

Show your endorsement of this answer by giving it a stamp of approval!

Canonical answers may be served to readers by Stampy, so only answers which have a reasonably high stamp score should be marked as canonical. All canonical answers are open to be collaboratively edited and updated, and they should represent a consensus response (written from the Stampy Point Of View) to a question which is within Stampy's scope.

Answers to questions from YouTube comments should not be marked as canonical, and will generally remain as they were when originally written since they have details which are specific to an idiosyncratic question. YouTube answers may be forked into wiki answers, in order to better respond to a particular question, in which case the YouTube question should have its canonical version field set to the new more widely useful question.

What sources of information can Stampy use?

Show your endorsement of this answer by giving it a stamp of approval!

As well as pulling human written answers to AI alignment questions from Stampy's Wiki, Stampy can:

  • Search for AI safety papers e.g. "stampy, what's that paper about corrigibility?"
  • Search for videos e.g. "what's that video where Rob talks about mesa optimizers, stampy?"
  • Calculate with Wolfram Alpha e.g. "s, what's the square root of 345?"
  • Search DuckDuckGo and return snippets
  • And (at least in the patron Discord) falls back to polling GPT-3 to answer uncaught questions

What training programs and courses are available for AGI safety?

Show your endorsement of this answer by giving it a stamp of approval!
  • AGI safety fundamentals (technical and governance) - Is the canonical AGI safety 101 course. 3.5 hours reading, 1.5 hours talking a week w/ facilitator for 8 weeks.
  • Refine - A 3-month incubator for conceptual AI alignment research in London, hosted by Conjecture.
  • AI safety camp - Actually do some AI research. More about output than learning.
  • SERI ML Alignment Theory Scholars Program SERI MATS - Four weeks developing an understanding of a research agenda at the forefront of AI alignment through online readings and cohort discussions, averaging 10 h/week. After this initial upskilling period, the scholars will be paired with an established AI alignment researcher for a two-week ‘research sprint’ to test fit. Assuming all goes well, scholars will be accepted into an eight-week intensive scholars program in Berkeley, California.
  • Principles of Intelligent Behavior in Biological and Social Systems (PIBBSS) - Brings together young researchers studying complex and intelligent behavior in natural and social systems.
  • Safety and Control for Artificial General Intelligence - An actual AI Safety university course (UC Berkeley). Touches multiple domains including cognitive science, utility theory, cybersecurity, human-machine interaction, and political science.

See also, this spreadsheet of learning resources.

When should I stamp an answer?

Show your endorsement of this answer by giving it a stamp of approval!

You show stamp an answer when you think it is accurate and well presented enough that you'd be happy to see it served to readers by Stampy.

Where can I find all the features of Stampy's Wiki?

Show your endorsement of this answer by giving it a stamp of approval!
The Editor portal collects them all in one place, or simply click show detailed answer below. Details on how to use each feature are on the individual pages.
The Editor portal collects them all in one place. Details on how to use each feature are on the individual pages.


Get involved


Questions

Answers

Review answers

Improve answers

Recent activity

Pages to create

Content

External

  • Stampy's Public Discord - Ask there for an invite to the real one, until OpenAI approves our chatbot for a public Discord
  • Wiki stats - Graphs over time of active users, edits, pages, response time, etc
  • Google Drive - Folder with Stampy-related documents

UI controls

To-do list

What are some specific open tasks on Stampy? '"`UNIQ--item-653--QINU`"' '"`UNIQ--item-654--QINU`"'

Show your endorsement of this answer by giving it a stamp of approval!

Other than the usual fare of writing and processing and organizing questions and answers, here are some specific open tasks:

'"`UNIQ--references-0000028F-QINU`"'

Where can I find people to talk to about AI alignment?

Show your endorsement of this answer by giving it a stamp of approval!

You can join:

Or book free calls with AI Safety Support.

Where can I find questions to answer for Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

Answer questions collects all the questions we definitely want answers to, browse there and see if you know how to answer any of them.

Where can I learn about interpretability?

Show your endorsement of this answer by giving it a stamp of approval!

Christoph Molnar's online book and distill are great sources.

Who created Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

Dev team

Name

Vision talk

Github

Trello

Active?

Notes / bio

Aprillion

video

Aprillion

yes

yes

experienced dev (Python, JS, CSS, ...)

Augustus Caesar

yes

AugustusCeasar

yes

soon!

Has some Discord bot experience

Benjamin Herman

no

no (not needed)

no

no

Helping with wiki design/css stuff

ccstan99

no

ccstan99

yes

yes

UI/UX designer

chriscanal

yes

chriscanal

yes

yes

experienced python dev

Damaged

no (not needed)

no (not needed)

no (not needed)

yes

experienced Discord bot dev, but busy with other projects. Can answer questions.

plex

yes

plexish

yes

yes

MediaWiki, plans, and coordinating people guy

robertskmiles

yes

robertskmiles

yes

yes

you've probably heard of him

Roland

yes

levitation

yes

yes

working on Semantic Search

sct202

yes

no (add when wiki is on github)

yes

yes

PHP dev, helping with wiki extensions

Social Christancing

yes

chrisrimmer

yes

maybe

experienced linux sysadmin

sudonym

yes

jmccuen

yes

yes

systems architect, has set up a lot of things

tayler6000

yes

tayler6000

no

yes

Python and PHP dev, PenTester, works on Discord bot

Editors

(add yourselves)

Who is Stampy?

Show your endorsement of this answer by giving it a stamp of approval!

Stampy is a character invented by Robert Miles and developed by the Stampy dev team. He is a stamp collecting robot, a play on clippy from the the paperclip maximizer thought experiment.

Stampy is designed to teach people about the risks of unaligned artificial intelligence, and facilitate a community of co-learners who build his FAQ database.

{{ACard|Plex's Answer to Why do we expect that a superintelligence would closely approximate a utility maximizer?

[[:Special:Ask/-5B-5BCategory:Questions-5D-5D-20-5B-5BAsker::plex-5D-5D-20-5B-5BCanonicalAnswer::+-5D-5D/-3FCanonicalAnswer-23-2D/mainlabel=-2D/limit=50/offset=50/format=plainlist/sep=|#userparam=QA}}

[[]]

Show your endorsement of this answer by giving it a stamp of approval!

... further results]]