plex

From Stampy's Wiki
User:756254556811165756 / (Redirected from plex)
Questions Asked: 68
Answers written: 99

I'm the resident MediaWiki guy and general organizer of all things stampy. Talk to me if you have questions or need access to something.

Also, here's my conversation menu, listing many of the topics I like to talk about. Feel free to book a call to discuss any of them.

Also, I'm practicing as a life coach specializing in people who want to work on AI alignment. I've been working with sudonym and chriscanal for a few months now, and I'm ready to take on a few more people for free calls to discuss your goals and try to disentangle things holding you back from them. While I have spare capacity I can do weekly calls, but may have to move to less regular if other things in my life pick up or a lot of people sign up.

Questions by plex which have been answered

How can I collect questions for Stampy?

As well as simply adding your own questions over at ask question, you could also message your friends with something like:

Hi,

I'm working on a project to create a comprehensive FAQ about AI alignment (you can read about it here https://stampy.ai/wiki/Stampy%27s_Wiki if interested). We're looking for questions and I thought you may have some good ones. If you'd be willing to write up a google doc with you top 5-10ish questions we'd be happy to write a personalized FAQ for you. https://stampy.ai/wiki/Scope explains the kinds of questions we're looking for.

Thanks!

and maybe bring the google doc to a Stampy editing session so we can collaborate on answering them or improving your answers to them.

How can I contact the Stampy team?

The Rob Miles AI Discord is the hub of all things Stampy. If you want to be part of the project and don't have access yet, ask plex#1874 on Discord (or plex on wiki).

You can also talk to us on the public Discord! Try #suggestions or #general, depending on what you want to talk about.

How can I contribute to Stampy?

If you're not already there, join the public Discord or ask for an invite to the semi-private one where contributors generally hang out.

The main ways you can help are to answer questions or add questions, or help to review questions, review answers, or improve answers (instructions for helping out with each of these tasks are on the linked pages). You could also join the dev team if you have programming skills.

You can include a live-updating version of many definitions from LW using the syntax on Template:TagDesc in the Answer field and Template:TagDescBrief on the Brief Answer field. Similarly, calling Template:TagDescEAF and Template:TagDescEAFBrief will pull from the EAF tag wiki.

When available this should be used as it reduces the duplication of effort and directs all editors to improving a single high quality source.

How do I format answers on Stampy?

Stampy uses MediaWiki markup, which includes a limited subset of HTML plus the following formatting options:

Items on lists start with *, numbered lists with #

  • For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
  • For internal links write the page title wrapped in [[]]s
    • e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
  • (ref)Reference notes go inside these tags(/ref)[1]
  • If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
  • To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
    • Start with ** or ## for double indentation
  • Three 's around text - Bold
  • Two 's around text Italic - Italic
Click show detailed for extra options and advanced usage.
  1. Note that we use ()s rather than the standard <>s for compatibility with Semantic MediaWiki. The references are automatically added to the bottom of the answer!
  2. If images seem popular we'll set up local uploads.
Stampy uses MediaWiki markup, which includes a limited subset of HTML plus the following formatting options:

Items on lists start with *, numbered lists with #

  • For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
  • For internal links write the page title wrapped in [[]]s
    • e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
  • (ref)Reference notes go inside these tags(/ref)[1]
  • If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
    I3ylPvE.png
  • To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
    • Start with ** or ## for double indentation
  • Three 's around text - Bold
  • Two 's around text Italic - Italic

Headings

have ==heading here== around them, more =s for smaller headings.

Wrap quotes in < blockquote>< /blockquote> tags (without the spaces)

There are also (poem) (/poem) to suppress linebreak removal, (pre) (/pre) for preformatted text, and (nowiki) (/nowiki) to not have that content parsed.[3]

We can pull live descriptions from the LessWrong/Alignment Forum using their identifier fro the URL, for example including the formatting on Template:TagDesc with orthogonality-thesis as a parameter will render as the full tag description from the LessWrong tag wiki entry on Orthogonality Thesis. Template:TagDescBrief is similar but will pull only the first paragraph without formatting.

For tables please use HTML tables rather than wikicode tables.

Edit this page to see examples.
  1. Note that we use ()s rather than the standard <>s for compatibility with Semantic MediaWiki. The references are automatically added to the bottom of the answer!
  2. If images seem popular we'll set up local uploads.
  3. () can also be used in place of allowed HTML tags. You can escape a () tag by placing a ! inside the start of the first entry. Be aware that () tags only nest up to two layers deep!

How does the stamp eigenkarma system work?

If someone posts something good - something that shows insight, knowledge of AI Safety, etc. - give the message or answer a stamp of approval! Stampy keeps track of these, and uses them to decide how much he likes each user. You can ask Stampy (in a PM if you like), "How many stamps am I worth?", and he'll tell you.

If something is really very good, especially if it took a lot of work/effort, give it a gold stamp. These are worth 5 regular stamps!

Note that stamps aren't just 'likes', so please don't give stamps to say "me too" or "that's funny" etc. They're meant to represent knowledge, understanding, good judgement, and contributing to the discord. You can use 💯 or ✔️ for things you agree with, 😂 or 🤣 for funny things etc.

Your stamp points determine how much say you have if there are disagreements on Stampy content, which channels you have permission to post to, your voting power for approving YouTube replies, and whether you get to invite people.

Notes on stamps and stamp points

  • Stamps awarded by people with a lot of stamp points are worth more
  • Awarding people stamps does not reduce your stamp points
  • New users who have 0 stamp points can still award stamps, they just have no effect. But it's still worth doing because if you get stamp points later, all your previous votes are retroactively updated!
  • Yes, this was kind of tricky to implement! Stampy actually stores how many stamps each user has awarded to every other user, and uses that to build a system of linear scalar equations which is then solved with numpy.
  • Each user has stamp points, and also gives a score to every other user they give stamps to the scores sum to 1 so if I give user A a stamp, my score for them will be 1.0, if I then give user B a stamp, my score for A is 0.5 and B is 0.5, if I give another to B, my score for A goes to 0.3333 and B to 0.66666 and so on
  • Score is "what proportion of the stamps I've given have gone to this user"
  • Everyone's stamp points is the sum of (every other user's score for them, times that user's stamp points) so the way to get points is to get stamps from people who have points
  • Rob is the root of the tree, he got one point from Stampy
  • So the idea is the stamp power kind of flows through the network, giving people points for posting things that I thought were good, or posting things that "people who posted things I thought were good" thought were good, and so on ad infinitum so for posting YouTube comments, Stampy won't send the comment until it has enough stamps of approval. Which could be a small number of high-points users or a larger number of lower-points users
  • Stamps given to yourself or to stampy do nothing

So yeah everyone ends up with a number that basically represents what Stampy thinks of them, and you can ask him "how many stamps am I worth?" to get that number

so if you have people a, b, and c, the points are calculated by:
a_points = (bs_score_for_a * b_points) + (cs_score_for_a * c_points)
b_points = (as_score_for_b * a_points) + (cs_score_for_b * c_points)
c_points = (as_score_for_c * a_points) + (bs_score_for_c * b_points)
which is tough because you need to know everyone else's score before you can calculate your own
but actually the system will have a fixed point - there'll be a certain arrangement of values such that every node has as much flowing out as flowing in - a stable configuration so you can rearrange
(bs_score_for_a * b_points) + (cs_score_for_a * c_points) - a_points = 0
(as_score_for_b * a_points) + (cs_score_for_b * c_points) - b_points = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) - c_points = 0
or, for neatness:
( -1 * a_points) + (bs_score_for_a * b_points) + (cs_score_for_a * c_points) = 0
(as_score_for_b * a_points) + ( -1 * b_points) + (cs_score_for_b * c_points) = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) + ( -1 * c_points) = 0
and this is just a system of linear scalar equations that you can throw at numpy.linalg.solve
(you add one more equation that says rob_points = 1, so there's some place to start from) there should be one possible distribution of points such that all of the equations hold at the same time, and numpy finds that by linear algebra magic beyond my very limited understanding
but as far as I can tell you can have all the cycles you want!
(I actually have the scores sum to slightly less than 1, to have the stamp power slightly fade out as it propagates, just to make sure it doesn't explode. But I don't think I actually need to do that)
and yes this means that any time anyone gives a stamp to anyone, ~everyone's points will change slightly
And yes this means I'm recalculating the matrix and re-solving it for every new stamp, but computers are fast and I'm sure there are cheaper approximations I could switch to later if necessary

The organizations which most regularly give grants to individuals working towards AI alignment are the Long Term Future Fund, Survival And Flourishing (SAF), the OpenPhil AI Fellowship and early career funding, the Future of Life Institute, the Future of Humanity Institute, and the Center on Long-Term Risk Fund. If you're able to relocate to the UK, CEEALAR (aka the EA Hotel) can be a great option as it offers free food and accommodation for up to two years, as well as contact with others who are thinking about these issues. The FTX Future Fund only accepts direct applications for $100k+ with an emphasis on massively scaleable interventions, but their regranters can make smaller grants for individuals. There are also opportunities from smaller grantmakers which you might be able to pick up if you get involved.

If you want to work on support or infrastructure rather than directly on research, the EA Infrastructure Fund may be able to help. In general, you can talk to EA funds before applying.

Each grant source has their own criteria for funding, but in general they are looking for candidates who have evidence that they're keen and able to do good work towards reducing existential risk (for example, by completing an AI Safety Camp project), though the EA Hotel in particular has less stringent requirements as they're able to support people at very low cost. If you'd like to talk to someone who can offer advice on applying for funding, AI Safety Support offers free calls.

Another option is to get hired by an organization which works on AI alignment, see the follow-up question for advice on that.

It's also worth checking the AI Alignment tag on the EA funding sources website for up-to-date suggestions.

An unaligned AI would not eliminate humans until it had replacements for the manual labor they provide to maintain civilization (e.g. a more advanced version of Tesla's Optimus). Until that point, it might settle for technologically and socially manipulating humans.

If by “solve alignment” you mean build a sufficiently performance-competitive superintelligence which has the goal of Coherent Extrapolated Volition or something else which captures human values, then yes. It would be able to deploy technology near the limits of physics (e.g. atomically precise manufacturing) to solve most of the other problems which face us, and steer the future towards a highly positive path for perhaps many billions of years until the heat death of the universe (barring more esoteric x-risks like encounters with advanced hostile civilizations, false vacuum decay, or simulation shutdown).

However, if you only have alignment of a superintelligence to a single human you still have the risk of misuse, so this should be at most a short-term solution. For example, what if Google creates a superintelligent AI, and it listens to the CEO of Google, and it’s programmed to do everything exactly the way the CEO of Google would want? Even assuming that the CEO of Google has no hidden unconscious desires affecting the AI in unpredictable ways, this gives one person a lot of power.

The AGI Safety Fundamentals Course is a arguably the best way to get up to speed on alignment, you can sign up to go through it with many other people studying and mentorship or read their materials independently.

Other great ways to explore include:

Great! I’ll ask you a few follow-up questions to help figure out how you can best contribute, give you some advice, and link you to resources which should help you on whichever path you choose. Feel free to scroll up and explore multiple branches of the FAQ if you want answers to more than one of the questions offered :)

Note: We’re still building out and improving this tree of questions and answers, any feedback is appreciated.

At what level of involvement were you thinking of helping?

Each major organization has a different approach. The research agendas are detailed and complex (see also AI Watch). Getting more brains working on any of them (and more money to fund them) may pay off in a big way, but it’s very hard to be confident which (if any) of them will actually work.

The following is a massive oversimplification, each organization actually pursues many different avenues of research, read the 2020 AI Alignment Literature Review and Charity Comparison for much more detail. That being said:

  • The Machine Intelligence Research Institute focuses on foundational mathematical research to understand reliable reasoning, which they think is necessary to provide anything like an assurance that a seed AI built will do good things if activated.
  • The Center for Human-Compatible AI focuses on Cooperative Inverse Reinforcement Learning and Assistance Games, a new paradigm for AI where they try to optimize for doing the kinds of things humans want rather than for a pre-specified utility function
  • Paul Christano's Alignment Research Center focuses is on prosaic alignment, particularly on creating tools that empower humans to understand and guide systems much smarter than ourselves. His methodology is explained on his blog.
  • The Future of Humanity Institute does work on crucial considerations and other x-risks, as well as AI safety research and outreach.
  • Anthropic is a new organization exploring natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
  • OpenAI is in a state of flux after major changes to their safety team.
  • DeepMind’s safety team is working on various approaches designed to work with modern machine learning, and does some communication via the Alignment Newsletter.
  • EleutherAI is a Machine Learning collective aiming to build large open source language models to allow more alignment research to take place.
  • Ought is a research lab that develops mechanisms for delegating open-ended thinking to advanced machine learning systems.

There are many other projects around AI Safety, such as the Windfall clause, Rob Miles’s YouTube channel, AI Safety Support, etc.

What are alternate phrasings for?

Alternate phrasings are used to improve the semantic search which Stampy uses to serve people questions, by giving alternate ways to say a question which might trigger a match when the main wording won't. They should generally only be used when there is a significantly different wording, rather than for only very minor changes.

What are good resources on AI alignment?

Unless there was a way to cryptographically ensure otherwise, whoever runs the emulation has basically perfect control over their environment and can reset them to any state they were previously in. This opens up the possibility of powerful interrogation and torture of digital people.

Imperfect uploading might lead to damage that causes the EM to suffer while still remaining useful enough to be run for example as a test subject for research. We would also have greater ability to modify digital brains. Edits done for research or economic purposes might cause suffering. See this fictional piece for an exploration of how a world with a lot of EM suffering might look like.

These problems are exacerbated by the likely outcome that digital people can be run much faster than biological humans, so it would be plausibly possible to have an EM run for hundreds of subjective years in minutes or hours without having checks on the wellbeing of the EM in question.

Try to avoid directly referencing the wording of the question in the answer, in order to make the answer more robust to alternate phrasings of the question. For example, that question might be "Can we do X" and the reply is "Yes, if we can manage Y", but then the question might be "Why can't we do X" or "What would happen if we tried to do X" so the answer should be like "We might be able to do X, if we can do Y", which works for all of those.

Linking to external sites is strongly encouraged, one of the most valuable things Stampy can do is help people find other parts of the alignment information ecosystem.

Superintelligence has an advantage that an early human didn’t – the entire context of human civilization and technology, there for it to manipulate socially or technologically.

What is AI alignment?

AI alignment is the the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing. This may be one of the hardest and most important problems we will ever face, as whether we succeed might mean the difference between human extinction and flourishing.

What is GPT-3?

GPT-3 is the newest and most impressive of the GPT (Generative Pretrained Transformer) series of large transformer-based language models created by OpenAI. It was announced in June 2020, and is 100 times larger than its predecessor GPT-2.[1]

Gwern has several resources exploring GPT-3's abilities, limitations, and implications including:

Vox has an article which explains why GPT-3 is a big deal.

  1. GPT-3: What’s it good for? - Cambridge University Press

Canonical questions are the questions which we've checked are in scope and not duplicates, so we want answers to them. They may be edited to represent a class of question more broadly, rather than keeping all their idosyncracies. Once they're answered canonically Stampy will serve them to readers.

An existing question is a duplicate of a new one if it is reasonable to expect whoever asked the new question to be satisfied if they received an answer to the existing question instead.

Follow-up questions are responses to an answer which reader might have, either because they want more information or are providing information to Stampy about what they're looking for. We don't expect to have great coverage of the former for a long time because there will be so many, but hopefully we'll be able to handle some of the most common ones.

What is a quantilizer?

A Quantilizer is a proposed AI design which aims to reduce the harms from Goodhart's law and specification gaming by selecting reasonably effective actions from a distribution of human-like actions, rather than maximizing over actions. It it more of a theoretical tool for exploring ways around these problems than a practical buildable design.

A Quantilizer is a proposed AI design which aims to reduce the harms from Goodhart's law and specification gaming by selecting reasonably effective actions from a distribution of human-like actions, rather than maximizing over actions. It it more of a theoretical tool for exploring ways around these problems than a practical buildable design.

See also

What is meant by AI takeoff?

AI Takeoff refers to the process of an Artificial General Intelligence going from a certain threshold of capability (often discussed as "human-level") to being super-intelligent and capable enough to control the fate of civilization. There has been much debate about whether AI takeoff is more likely to be slow vs fast, i.e., "soft" vs "hard".

AI Takeoff refers to the process of an Artificial General Intelligence going from a certain threshold of capability (often discussed as "human-level") to being super-intelligent and capable enough to control the fate of civilization. There has been much debate about whether AI takeoff is more likely to be slow vs fast, i.e., "soft" vs "hard".

See also: AI Timelines, Seed AI, Singularity, Intelligence explosion, Recursive self-improvement

AI takeoff is sometimes casually referred to as AI FOOM.

Soft takeoff

A soft takeoff refers to an AGI that would self-improve over a period of years or decades. This could be due to either the learning algorithm being too demanding for the hardware or because the AI relies on experiencing feedback from the real-world that would have to be played out in real-time. Possible methods that could deliver a soft takeoff, by slowly building on human-level intelligence, are Whole brain emulation, Biological Cognitive Enhancement, and software-based strong AGI [1]. By maintaining control of the AGI's ascent it should be easier for a Friendly AI to emerge.

Vernor Vinge, Hans Moravec and have all expressed the view that soft takeoff is preferable to a hard takeoff as it would be both safer and easier to engineer.

Hard takeoff

A hard takeoff (or an AI going "FOOM" [2]) refers to AGI expansion in a matter of minutes, days, or months. It is a fast, abruptly, local increase in capability. This scenario is widely considered much more precarious, as this involves an AGI rapidly ascending in power without human control. This may result in unexpected or undesired behavior (i.e. Unfriendly AI). It is one of the main ideas supporting the Intelligence explosion hypothesis.

The feasibility of hard takeoff has been addressed by Hugo de Garis, Eliezer Yudkowsky, Ben Goertzel, Nick Bostrom, and Michael Anissimov. It is widely agreed that a hard takeoff is something to be avoided due to the risks. Yudkowsky points out several possibilities that would make a hard takeoff more likely than a soft takeoff such as the existence of large resources overhangs or the fact that small improvements seem to have a large impact in a mind's general intelligence (i.e.: the small genetic difference between humans and chimps lead to huge increases in capability) [3].

Notable posts

External links

References

  1. http://www.aleph.se/andart/archives/2010/10/why_early_singularities_are_softer.html
  2. http://lesswrong.com/lw/63t/requirements_for_ai_to_go_foom/
  3. http://lesswrong.com/lw/wf/hard_takeoff/

What is narrow AI?

A Narrow AI is capable of operating only in a relatively limited domain, such as chess or driving, rather than capable of learning a broad range of tasks like a human or an Artificial General Intelligence. Narrow vs General is not a perfectly binary classification, there are degrees of generality with, for example, large language models having a fairly large degree of generality (as the domain of text is large) without being as general as a human, and we may eventually build systems that are significantly more general than humans.

What is the Stampy project?

Stampy is open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing.

We're also building a web UI (early prototype) and bot interface, so you'll soon be able to browse the FAQ and other sources in a cleaner way than the wiki.
The Stampy project is open effort to build a comprehensive FAQ about artificial intelligence existential safety—the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing.

We're also building a web UI (early prototype) and bot interface, so you'll soon be able to browse the FAQ and other sources in a cleaner way than the wiki.

The goals of the project are to:

  • Offer a one-stop-shop for high-quality answers to common questions about AI alignment.
    • Let people answer questions in a way which scales, freeing up researcher time while allowing more people to learn from a reliable source.
    • Make external resources more easy to find by having links to them connected to a search engine which gets smarter the more it's used.
  • Provide a form of legitimate peripheral participation for the AI Safety community, as an on-boarding path with a flexible level of commitment.
    • Encourage people to think, read, and talk about AI alignment while answering questions, creating a community of co-learners who can give each other feedback and social reinforcement.
    • Provide a way for budding researchers to prove their understanding of the topic and ability to produce good work.
  • Collect data about the kinds of questions people actually ask and how they respond, so we can better focus resources on answering them.
If you would like to help out, join us on the Discord and either jump right into editing or read get involved for answers to common questions.
Stampy is focused specifically on AI existential safety (both introductory and technical questions), but does not aim to cover general AI questions or other topics which don't interact strongly with the effects of AI on humanity's long-term future.
Stampy is focused on answering common questions people have which are specifically about AI existential safety. More technical questions are also in our scope, though replying to all possible proposals is not feasible and this is not a great place to submit detailed ideas for evaluation.

We are interested in:

  • Introductory questions closely related to the field e.g.
    • "How long will it be until transformative AI arrives?"
    • "Why might advanced AI harm humans?"
  • Technical questions related to the field e.g.
    • "What is Cooperative Inverse Reinforcement Learning?"
    • "What is Logical Induction useful for?"
  • Questions about how to contribute to the field e.g.
    • "Should I get a PhD?"
    • "Where can I find relevant job opportunities?"

More good examples can be found at canonical questions.

We do not aim to cover:

  • Aspects of AI Safety or fairness which are not strongly relevant to existential safety e.g.
    • "How should self-driving cars weigh up moral dilemmas"
    • "How can we minimize the risk of privacy problems caused by machine learning algorithms?"
  • Extremely specific and detailed questions the answering of which is unlikely to be of value to more than a single person e.g.
    • "What if we did <multiple paragraphs of dense text>? Would that result in safe AI?"
We will generally not delete out-of-scope content, but it will be reviewed as low priority to answer, not be marked as a canonical question, and not be served to readers by stampy.

Canonical answers may be served to readers by Stampy, so only answers which have a reasonably high stamp score should be marked as canonical. All canonical answers are open to be collaboratively edited and updated, and they should represent a consensus response (written from the Stampy Point Of View) to a question which is within Stampy's scope.

Answers to YouTube questions should not be marked as canonical, and will generally remain as they were when originally written since they have details which are specific to an idiosyncratic question. YouTube answers may be forked into wiki answers, in order to better respond to a particular question, in which case the YouTube question should have its canonical version field set to the new more widely useful question.

As well as pulling human written answers to AI alignment questions from Stampy's Wiki, Stampy can:

  • Search for AI safety papers e.g. "stampy, what's that paper about corrigibility?"
  • Search for videos e.g. "what's that video where Rob talks about mesa optimizers, stampy?"
  • Calculate with Wolfram Alpha e.g. "s, what's the square root of 345?"
  • Search DuckDuckGo and return snippets
  • And (at least in the patron Discord) falls back to polling GPT-3 to answer uncaught questions

When should I stamp an answer?

You show stamp an answer when you think it is accurate and well presented enough that you'd be happy to see it served to readers by Stampy.

Answer questions collects all the questions we definitely want answers to, browse there and see if you know how to answer any of them.

The Editor portal collects them all in one place, or simply click show detailed answer below. Details on how to use each feature are on the individual pages.
The Editor portal collects them all in one place. Details on how to use each feature are on the individual pages.


Get involved


Questions

Answers

Review answers

Improve answers

Recent activity

Pages to create

Content

External

  • Stampy's Public Discord - Ask there for an invite to the real one, until OpenAI approves our chatbot for a public Discord
  • Wiki stats - Graphs over time of active users, edits, pages, response time, etc
  • Google Drive - Folder with Stampy-related documents

UI controls

Where can I learn about interpretability?

Christoph Molnar's online book and distill are great sources.

Who created Stampy?

Dev team

Name

Vision talk

Github

Trello

Active?

Notes / bio

Aprillion

video

Aprillion

yes

yes

experienced dev (Python, JS, CSS, ...)

Augustus Caesar

yes

AugustusCeasar

yes

soon!

Has some Discord bot experience

Benjamin Herman

no

no (not needed)

no

no

Helping with wiki design/css stuff

ChengCheng Tan

ccstan99

no

yes

yes

UI/UX designer

chriscanal

yes

chriscanal

yes

yes

experienced python dev

Damaged

no (not needed)

no (not needed)

no (not needed)

yes

experienced Discord bot dev, but busy with other projects. Can answer questions.

plex

yes

plexish

yes

yes

MediaWiki, plans, and coordinating people guy

robertskmiles

yes

robertskmiles

yes

yes

you've probably heard of him

Roland

yes

levitation

yes

yes

working on Semantic Search

sct202

yes

no (add when wiki is on github)

yes

yes

PHP dev, helping with wiki extensions

Social Christancing

yes

chrisrimmer

yes

maybe

experienced linux sysadmin

sudonym

yes

jmccuen

yes

yes

systems architect, has set up a lot of things

Editors

(add yourselves)

Who is Stampy?

Stampy is a character invented by Robert Miles and developed by the. He is a stamp collecting robot, a play on clippy from the the paperclip maximizer thought experiment.

Stampy is designed to teach people about the risks of unaligned artificial intelligence, and facilitate a community of co-learners who build his FAQ database.

It certainly would be very unwise to purposefully create an artificial general intelligence now, before we have found a way to be certain it will act purely in our interests. But "general intelligence" is more of a description of a system's capabilities, and a vague one at that. We don't know what it takes to build such a system. This leads to the worrying possibility that our existing, narrow AI systems require only minor tweaks, or even just more computer power, to achieve general intelligence.

The pace of research in the field suggests that there's a lot of low-hanging fruit left to pick, after all, and the results of this research produce better, more effective AI in a landscape of strong competitive pressure to produce as highly competitive systems as we can. "Just" not building an AGI means ensuring that every organization in the world with lots of computer hardware doesn't build an AGI, either accidentally or mistakenly thinking they have a solution to the alignment problem, forever. It's simply far safer to also work on solving the alignment problem.

Why is AI alignment hard?

There's the "we never figure out how to reliably instill AIs with human friendly goals" filter, which seems pretty challenging, especially with inner alignment, solving morality in a way which is possible to code up, interpretability, etc.

There's the "race dynamics mean that even though we know how to build the thing safely the first group to cross the recursive self-improvement line ends up not implementing it safely" which is potentially made worse by the twin issues of "maybe robustly aligned AIs are much harder to build" and "maybe robustly aligned AIs are much less compute efficient".

There's the "we solved the previous problems but writing perfectly reliably code in a whole new domain is hard and there is some fatal bug which we don't find until too late" filter. The paper The Pursuit of Exploitable Bugs in Machine Learning explores this.

For a much more in depth analysis, see Paul Christiano's AI Alignment Landscape talk and The Main Sources of AI Risk?.

Computers only do what you tell them. But any programmer knows that this is precisely the problem: computers do exactly what you tell them, with no common sense or attempts to interpret what the instructions really meant. If you tell a human to cure cancer, they will instinctively understand how this interacts with other desires and laws and moral rules; if a maximizing AI acquires a goal of trying to cure cancer, it will literally just want to cure cancer.

Define a closed-ended goal as one with a clear endpoint, and an open-ended goal as one to do something as much as possible. For example “find the first one hundred digits of pi” is a closed-ended goal; “find as many digits of pi as you can within one year” is an open-ended goal. According to many computer scientists, giving a superintelligence an open-ended goal without activating human instincts and counterbalancing considerations will usually lead to disaster.

To take a deliberately extreme example: suppose someone programs a superintelligence to calculate as many digits of pi as it can within one year. And suppose that, with its current computing power, it can calculate one trillion digits during that time. It can either accept one trillion digits, or spend a month trying to figure out how to get control of the TaihuLight supercomputer, which can calculate two hundred times faster. Even if it loses a little bit of time in the effort, and even if there’s a small chance of failure, the payoff – two hundred trillion digits of pi, compared to a mere one trillion – is enough to make the attempt. But on the same basis, it would be even better if the superintelligence could control every computer in the world and set it to the task. And it would be better still if the superintelligence controlled human civilization, so that it could direct humans to build more computers and speed up the process further.

Now we’re in a situation where a superintelligence wants to take over the world. Taking over the world allows it to calculate more digits of pi than any other option, so without an architecture based around understanding human instincts and counterbalancing considerations, even a goal like “calculate as many digits of pi as you can” would be potentially dangerous.

If you're looking for a shovel ready and genuinely useful task to further AI alignment without necessarily committing a large amount of time or needing deep specialist knowledge, we think Stampy is a great option!

Creating a high-quality single point of access where people can be onboarded and find resources around the alignment ecosystem seems likely high-impact.

Additionally, contributing to Stampy means being part of a community of co-learners who provide mentorship and encouragement to join the effort to give humanity a bight future.
If you're looking for a shovel ready and genuinely useful task to further AI alignment without necessarily committing a large amount of time or needing deep specialist knowledge, we think Stampy is a great option.

Creating a high-quality single point of access where people can be onboarded and find resources around the alignment ecosystem seems likely high-impact. So, what makes us the best option?

  1. Unlike all other entry points to learning about alignment, we doge the trade-off between comprehensiveness and being overwhelmingly long with interactivity (tab explosion in one page!) and semantic search. Single document FAQs can't do this, so we built a system which can.
  2. We have the ability to point large numbers of viewers towards Stampy once we have the content, thanks to Rob Miles and his 100k+ subscribers, so this won't remain an unnoticed curiosity.
  3. Unlike most other entry points, we are open for volunteers to help improve the content.
The main notable one which does is the LessWrong tag wiki, which hosts descriptions of core concepts. We strongly believe in not needlessly duplicating effort, so we're pulling live content from that for the descriptions on our own tag pages, and directing the edit links on those to the edit page on the LessWrong wiki.
You might also consider improving Wikipedia's alignment coverage or the LessWrong wiki, but we think Stampy has the most low-hanging fruit right now. Additionally, contributing to Stampy means being part of a community of co-learners who provide mentorship and encouragement to join the effort to give humanity a bight future. If you're an established researcher or have high-value things to do elsewhere in the ecosystem it might not be optimal to put much time into Stampy, but if you're looking for a way to get more involved it might well be.

Will we ever build a superintelligence?

Humanity hasn't yet built a superintelligence, and we might not be able to without significantly more knowledge and computational resources There could be an existential catastrophe that prevents us from ever building one. For the rest of the answer let's assume no such event stops technological progress.

With that out of the way: there is no known good theoretical reason we can't build it at some point in the future; the majority of AI research is geared towards making more capable AI systems; and a significant chunk of top-level AI research attempts to make more generally capable AI systems. There is a clear economic incentive to develop more and more intelligent machines and currently billions of dollars of funding are being deployed for advancing AI capabilities.

We consider ourselves to be generally intelligent (i.e. capable of learning and adapting ourselves to a very wide range of tasks and environments), but the human brain almost certainly isn't the most efficient way to solve problems. One hint is the existence of AI systems with superhuman capabilities at narrow tasks. Not only superhuman performance (as in, AlphaGo beating the Go world champion) but superhuman speed and precision (as in, industrial sorting machines). There is no known discontinuity between tasks, something special and unique about human brains that unlocks certain capabilities which cannot be implemented in machines in principle. Therefore we would expect AI to surpass human performance on all tasks as progress continues.

In addition, several research groups (DeepMind being one of the most overt about this) explicitly aim for generally capable systems. AI as a field is growing, year after year. Critical voices about AI progress usually argue against a lack of precautions around the impact of AI, or against general AI happening very soon, not against it happening at all.

A satire of arguments against the possibility of superintelligence can be found here.

This is a really interesting question! Because, yeah it certainly seems to me that doing something like this would at least help, but it's not mentioned in the paper the video is based on. So I asked the author of the paper, and she said "It wouldn't improve the security guarantee in the paper, so it wasn't discussed. Like, there's a plausible case that it's helpful, but nothing like a proof that it is". To explain this I need to talk about something I gloss over in the video, which is that the quantilizer isn't really something you can actually build. The systems we study in AI Safety tend to fall somewhere on a spectrum from "real, practical AI system that is so messy and complex that it's hard to really think about or draw any solid conclusions from" on one end, to "mathematical formalism that we can prove beautiful theorems about but not actually build" on the other, and quantilizers are pretty far towards the 'mathematical' end. It's not practical to run an expected utility calculation on every possible action like that, for one thing. But, proving things about quantilizers gives us insight into how more practical AI systems may behave, or we may be able to build approximations of quantilizers, etc. So it's like, if we built something that was quantilizer-like, using a sensible human utility function and a good choice of safe distribution, this idea would probably help make it safer. BUT you can't prove that mathematically, without making probably a lot of extra assumptions about the utility function and/or the action distribution. So it's a potentially good idea that's nonetheless hard to express within the framework in which the quantilizer exists. TL;DR: This is likely a good idea! But can we prove it?