plex
Questions Asked: | 75 |
Answers written: | 106 |
I'm the resident MediaWiki guy and general organizer of all things stampy. Talk to me if you have questions or need access to something.
Also, here's my conversation menu, listing many of the topics I like to talk about. Feel free to book a call to discuss any of them.
Also, I'm practicing as a life coach specializing in people who want to work on AI alignment. I've been working with sudonym and chriscanal for a few months now, and I'm ready to take on a few more people for free calls to discuss your goals and try to disentangle things holding you back from them. While I have spare capacity I can do weekly calls, but may have to move to less regular if other things in my life pick up or a lot of people sign up.
Questions by plex which have been answered
How can I collect questions for Stampy?
As well as simply adding your own questions over at ask question, you could also message your friends with something like:
Hi,
I'm working on a project to create a comprehensive FAQ about AI alignment (you can read about it here https://stampy.ai/wiki/Stampy%27s_Wiki if interested). We're looking for questions and I thought you may have some good ones. If you'd be willing to write up a google doc with you top 5-10ish questions we'd be happy to write a personalized FAQ for you. https://stampy.ai/wiki/Scope explains the kinds of questions we're looking for.
Thanks!
and maybe bring the google doc to a Stampy editing session so we can collaborate on answering them or improving your answers to them.
How can I contact the Stampy team?
The Rob Miles AI Discord is the hub of all things Stampy. If you want to be part of the project and don't have access yet, ask plex#1874 on Discord (or plex on wiki).
You can also talk to us on the public Discord! Try #suggestions or #general, depending on what you want to talk about.
How can I contribute to Stampy?
If you're not already there, join the public Discord or ask for an invite to the semi-private one where contributors generally hang out.
The main ways you can help are to answer questions or add questions, or help to review questions, review answers, or improve answers (instructions for helping out with each of these tasks are on the linked pages). You could also join the dev team if you have programming skills.
How can I join the Stampy dev team?
The development team works on multiple projects in support of Stampy. Currently, these projects include:
- Stampy UI, which is made mostly in TypeScript.
- The Stampy Wiki, which is made mostly in PHP and JavaScript.
- The Stampy Bot, which is made in Python.
However, even if you don’t specialize in any of these areas, do reach out if you would like to help.
To join, please contact our Project Manager, plex. You can reach him on discord at plex#1874. He will be able to point your skills in the right direction to help in the most effective way possible.
How do I add content from LessWrong / Effective Altruism Forum tag-wikis to Stampy?
You can include a live-updating version of many definitions from LW using the syntax on Template:TagDesc in the Answer field and Template:TagDescBrief on the Brief Answer field. Similarly, calling Template:TagDescEAF and Template:TagDescEAFBrief will pull from the EAF tag wiki.
When available this should be used as it reduces the duplication of effort and directs all editors to improving a single high quality source.
How do I format answers on Stampy?
Items on lists start with *, numbered lists with #
- For external links use [ followed directly by the URL, a space, then display text and finally a ] symbol
- e.g. [https://www.example.com External link text] gives External link text
- For internal links write the page title wrapped in [[]]s
- e.g. [[What is the Stampy project?]] gives What is the Stampy project?. Including a pipe symbol followed by display text e.g. [[What is the Stampy project?┊Display Text]] allows you to show different Display Text.
- (ref)Reference notes go inside these tags(/ref)[1]
- If you post the raw URL of an image from imgur it will be displayed.[2] You can reduce file compression if you get an account. Note that you need the image itself, right click -> copy image address to get it
- To embed a YouTube video, use (youtube)APsK8NST4qE(/youtube) with the video ID of the target video.
- Start with ** or ## for double indentation
- Three 's around text - Bold
- Two 's around text Italic - Italic
How does the stamp eigenkarma system work?
If someone posts something good - something that shows insight, knowledge of AI Safety, etc. - give the message or answer a stamp of approval! Stampy keeps track of these, and uses them to decide how much he likes each user. You can ask Stampy (in a PM if you like), "How many stamps am I worth?", and he'll tell you.
If something is really very good, especially if it took a lot of work/effort, give it a gold stamp. These are worth 5 regular stamps!
Note that stamps aren't just 'likes', so please don't give stamps to say "me too" or "that's funny" etc. They're meant to represent knowledge, understanding, good judgement, and contributing to the discord. You can use 💯 or ✔️ for things you agree with, 😂 or 🤣 for funny things etc.
Your stamp points determine how much say you have if there are disagreements on Stampy content, which channels you have permission to post to, your voting power for approving YouTube replies, and whether you get to invite people.
Notes on stamps and stamp points
- Stamps awarded by people with a lot of stamp points are worth more
- Awarding people stamps does not reduce your stamp points
- New users who have 0 stamp points can still award stamps, they just have no effect. But it's still worth doing because if you get stamp points later, all your previous votes are retroactively updated!
- Yes, this was kind of tricky to implement! Stampy actually stores how many stamps each user has awarded to every other user, and uses that to build a system of linear scalar equations which is then solved with numpy.
- Each user has stamp points, and also gives a score to every other user they give stamps to the scores sum to 1 so if I give user A a stamp, my score for them will be 1.0, if I then give user B a stamp, my score for A is 0.5 and B is 0.5, if I give another to B, my score for A goes to 0.3333 and B to 0.66666 and so on
- Score is "what proportion of the stamps I've given have gone to this user"
- Everyone's stamp points is the sum of (every other user's score for them, times that user's stamp points) so the way to get points is to get stamps from people who have points
- Rob is the root of the tree, he got one point from Stampy
- So the idea is the stamp power kind of flows through the network, giving people points for posting things that I thought were good, or posting things that "people who posted things I thought were good" thought were good, and so on ad infinitum so for posting YouTube comments, Stampy won't send the comment until it has enough stamps of approval. Which could be a small number of high-points users or a larger number of lower-points users
- Stamps given to yourself or to stampy do nothing
So yeah everyone ends up with a number that basically represents what Stampy thinks of them, and you can ask him "how many stamps am I worth?" to get that number
so if you have people a, b, and c, the points are calculated by:
a_points = (bs_score_for_a * b_points) + (cs_score_for_a * c_points)
b_points = (as_score_for_b * a_points) + (cs_score_for_b * c_points)
c_points = (as_score_for_c * a_points) + (bs_score_for_c * b_points)
which is tough because you need to know everyone else's score before you can calculate your own
but actually the system will have a fixed point - there'll be a certain arrangement of values such that every node has as much flowing out as flowing in - a stable configuration
so you can rearrange
(bs_score_for_a * b_points) + (cs_score_for_a * c_points) - a_points = 0
(as_score_for_b * a_points) + (cs_score_for_b * c_points) - b_points = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) - c_points = 0
or, for neatness:
( -1 * a_points) + (bs_score_for_a * b_points) + (cs_score_for_a * c_points) = 0
(as_score_for_b * a_points) + ( -1 * b_points) + (cs_score_for_b * c_points) = 0
(as_score_for_c * a_points) + (bs_score_for_c * b_points) + ( -1 * c_points) = 0
and this is just a system of linear scalar equations that you can throw at numpy.linalg.solve
(you add one more equation that says rob_points = 1, so there's some place to start from)
there should be one possible distribution of points such that all of the equations hold at the same time, and numpy finds that by linear algebra magic beyond my very limited understanding
but as far as I can tell you can have all the cycles you want!
(I actually have the scores sum to slightly less than 1, to have the stamp power slightly fade out as it propagates, just to make sure it doesn't explode. But I don't think I actually need to do that)
and yes this means that any time anyone gives a stamp to anyone, ~everyone's points will change slightly
And yes this means I'm recalculating the matrix and re-solving it for every new stamp, but computers are fast and I'm sure there are cheaper approximations I could switch to later if necessary
I want to work on AI alignment. How can I get funding?
An unaligned AI would not eliminate humans until it had replacements for the manual labor they provide to maintain civilization (e.g. a more advanced version of Tesla's Optimus). Until that point, it might settle for technologically and socially manipulating humans.
If we solve alignment, are we sure of a good future?
If by “solve alignment” you mean build a sufficiently performance-competitive superintelligence which has the goal of Coherent Extrapolated Volition or something else which captures human values, then yes. It would be able to deploy technology near the limits of physics (e.g. atomically precise manufacturing) to solve most of the other problems which face us, and steer the future towards a highly positive path for perhaps many billions of years until the heat death of the universe (barring more esoteric x-risks like encounters with advanced hostile civilizations, false vacuum decay, or simulation shutdown).
However, if you only have alignment of a superintelligence to a single human you still have the risk of misuse, so this should be at most a short-term solution. For example, what if Google creates a superintelligent AI, and it listens to the CEO of Google, and it’s programmed to do everything exactly the way the CEO of Google would want? Even assuming that the CEO of Google has no hidden unconscious desires affecting the AI in unpredictable ways, this gives one person a lot of power.
I’d like to get deeper into the AI alignment literature. Where should I look?
The AGI Safety Fundamentals Course is a arguably the best way to get up to speed on alignment, you can sign up to go through it with many other people studying and mentorship or read their materials independently.
Other great ways to explore include:
- AXRP is a podcast with high quality interviews with top alignment researchers.
- The AI Safety Papers database is a search and browsing interface for most of the transformative AI literature.
- Reading posts on the Alignment Forum can be valuable (see their curated posts and tags).
- Taking a deep dive into Yudkowsky's models of the challenges to aligned AI, via the Arbital Alignment pages.
- Signing up to the Alignment Newsletter for an overview of current developments, and reading through some of the archives (or listening to the podcast).
- Reading some of the introductory books.
- More on AI Safety Support's list of links, Nonlinear's list of technical courses, reading lists, and curriculums, Stampy's canonical answers list, and Vika's resources list.
You might also want to consider reading Rationality: A-Z which covers a lot of skills that are valuable to acquire for people trying to think about large and complex issues, with The Rationalist's Guide to the Galaxy available as a shorter and more accessible AI-focused option.
I’m convinced that this is important and want to contribute. What can I do to help?
Great! I’ll ask you a few follow-up questions to help figure out how you can best contribute, give you some advice, and link you to resources which should help you on whichever path you choose. Feel free to scroll up and explore multiple branches of the FAQ if you want answers to more than one of the questions offered :)
Note: We’re still building out and improving this tree of questions and answers, any feedback is appreciated.
At what level of involvement were you thinking of helping?
What approaches are AI alignment organizations working on?
Each major organization has a different approach. The research agendas are detailed and complex (see also AI Watch). Getting more brains working on any of them (and more money to fund them) may pay off in a big way, but it’s very hard to be confident which (if any) of them will actually work.
The following is a massive oversimplification, each organization actually pursues many different avenues of research, read the 2020 AI Alignment Literature Review and Charity Comparison for much more detail. That being said:
- The Machine Intelligence Research Institute focuses on foundational mathematical research to understand reliable reasoning, which they think is necessary to provide anything like an assurance that a seed AI built will do good things if activated.
- The Center for Human-Compatible AI focuses on Cooperative Inverse Reinforcement Learning and Assistance Games, a new paradigm for AI where they try to optimize for doing the kinds of things humans want rather than for a pre-specified utility function
- Paul Christano's Alignment Research Center focuses is on prosaic alignment, particularly on creating tools that empower humans to understand and guide systems much smarter than ourselves. His methodology is explained on his blog.
- The Future of Humanity Institute does work on crucial considerations and other x-risks, as well as AI safety research and outreach.
- Anthropic is a new organization exploring natural language, human feedback, scaling laws, reinforcement learning, code generation, and interpretability.
- OpenAI is in a state of flux after major changes to their safety team.
- DeepMind’s safety team is working on various approaches designed to work with modern machine learning, and does some communication via the Alignment Newsletter.
- EleutherAI is a Machine Learning collective aiming to build large open source language models to allow more alignment research to take place.
- Ought is a research lab that develops mechanisms for delegating open-ended thinking to advanced machine learning systems.
There are many other projects around AI Safety, such as the Windfall clause, Rob Miles’s YouTube channel, AI Safety Support, etc.
What are alternate phrasings for?
Alternate phrasings are used to improve the semantic search which Stampy uses to serve people questions, by giving alternate ways to say a question which might trigger a match when the main wording won't. They should generally only be used when there is a significantly different wording, rather than for only very minor changes.
What are some good resources on AI alignment?
- Rob's YouTube videos (Computerphile appearances)
- AI Safety Papers database - Search and interface for the TAI Safety Bibliography
- AGI Safety Fundamentals Course
- Alignment Forum tags
- The Alignment Newsletter (and database sheet)
- Chapters of Bostrom's Superintelligence online (Initial paper which Superintelligence grew from)
- AI Alignment pages on Arbital
- Much more on AI Safety Support (feel free to integrate useful things from there to here)
- Vika's resources list
- AI safety technical courses, reading lists, and curriculums
- AI Safety Intro blog
- Stampy's canonical answers list
What are some of the most impressive recent advances in AI capabilities?
GPT-3 showed that transformers are capable of a vast array of natural language tasks, codex/copilot extended this into programming. One demonstrations of GPT-3 is Simulated Elon Musk lives in a simulation. Important to note that there are several much better language models, but they are not publicly available.
DALL-E and DALL-E 2 are among the most visually spectacular.
MuZero, which learned Go, Chess, and many Atari games without any directly coded info about those environments. The graphic there explains it, this seems crucial for being able to do RL in novel environments. We have systems which we can drop into a wide variety of games and they just learn how to play. The same algorithm was used in Tesla's self-driving cars to do complex route finding. These things are general.
Generally capable agents emerge from open-ended play - Diverse procedurally generated environments provide vast amounts of training data for AIs to learn generally applicable skills. Creating Multimodal Interactive Agents with Imitation and Self-Supervised Learning shows how these kind of systems can be trained to follow instructions in natural language.
GATO shows you can distill 600+ individually trained tasks into one network, so we're not limited by the tasks being fragmented.
What are some specific open tasks on Stampy?
Other than the usual fare of writing and processing and organizing questions and answers, here are some specific open tasks:
- Porting over some of Steve Byrnes's FAQ on alignment
- Porting over content from Vael Gates's post
- Porting over QA pairs from https://www.lesswrong.com/posts/8c8AZq5hgifmnHKSN/agi-safety-faq-all-dumb-questions-allowed-thread
- Porting over some of https://aisafety.wordpress.com/
What are the ethical challenges related to whole brain emulation?
Unless there was a way to cryptographically ensure otherwise, whoever runs the emulation has basically perfect control over their environment and can reset them to any state they were previously in. This opens up the possibility of powerful interrogation and torture of digital people.
Imperfect uploading might lead to damage that causes the EM to suffer while still remaining useful enough to be run for example as a test subject for research. We would also have greater ability to modify digital brains. Edits done for research or economic purposes might cause suffering. See this fictional piece for an exploration of how a world with a lot of EM suffering might look like.
These problems are exacerbated by the likely outcome that digital people can be run much faster than biological humans, so it would be plausibly possible to have an EM run for hundreds of subjective years in minutes or hours without having checks on the wellbeing of the EM in question.
What are the style guidelines for writing for Stampy?
Try to avoid directly referencing the wording of the question in the answer, in order to make the answer more robust to alternate phrasings of the question. For example, that question might be "Can we do X" and the reply is "Yes, if we can manage Y", but then the question might be "Why can't we do X" or "What would happen if we tried to do X" so the answer should be like "We might be able to do X, if we can do Y", which works for all of those.
Linking to external sites is strongly encouraged, one of the most valuable things Stampy can do is help people find other parts of the alignment information ecosystem.
Consider enclosing newly introduced terms, likely to be unfamiliar to many readers, in speech marks. If unsure, Google the term (in speech marks!) and see if it shows up anywhere other than LessWrong, the Alignment Forum, etc. Be judicious, as it's easy to use too many, but used carefully they can psychologically cushion newbies from a lot of unfamiliar terminology - in this context they're saying something like "we get that we're hitting you with a lot of new vocab, and you might not know what this term means yet".
No Oxford commas please! "Snap, Crackle and Pop", NOT "Snap, Crackle, and Pop".
When selecting related questions, there shouldn't be more than four unless there's a really good reason for that (some questions are asking for it, like the "Why can't we just..." question). It's also recommended to include at least one more "enticing" question to draw users in (relating to the more sensational, sci-fi, philosophical/ethical side of things) alongside more bland/neutral questions.
What harm could a single superintelligence do when it took so many humans to build civilization?
Superintelligence has an advantage that an early human didn’t – the entire context of human civilization and technology, there for it to manipulate socially or technologically.
AI alignment is the the field trying to make sure that when we build superintelligent artificial systems they are aligned with human values so that they do things compatible with our survival and flourishing. This may be one of the hardest and most important problems we will ever face, as whether we succeed might mean the difference between human extinction and flourishing.
A Narrow AI is capable of operating only in a relatively limited domain, such as chess or driving, rather than capable of learning a broad range of tasks like a human or an Artificial General Intelligence. Narrow vs General is not a perfectly binary classification, there are degrees of generality with, for example, large language models having a fairly large degree of generality (as the domain of text is large) without being as general as a human, and we may eventually build systems that are significantly more general than humans.
GPT-3 is the newest and most impressive of the GPT (Generative Pretrained Transformer) series of large transformer-based language models created by OpenAI. It was announced in June 2020, and is 100 times larger than its predecessor GPT-2.[1]
Gwern has several resources exploring GPT-3's abilities, limitations, and implications including:
- The Scaling Hypothesis - How simply increasing the amount of compute with current algorithms might create very powerful systems.
- GPT-3 Nonfiction
- GPT-3 Creative Fiction
Vox has an article which explains why GPT-3 is a big deal.
- ↑ GPT-3: What’s it good for? - Cambridge University Press
What is a canonical question on Stampy's Wiki?
Canonical questions are the questions which we've checked are in scope and not duplicates, so we want answers to them. They may be edited to represent a class of question more broadly, rather than keeping all their idosyncracies. Once they're answered canonically Stampy will serve them to readers.
What is a duplicate question on Stampy's Wiki?
An existing question is a duplicate of a new one if it is reasonable to expect whoever asked the new question to be satisfied if they received an answer to the existing question instead.
What is a follow-up question on Stampy's Wiki?
Follow-up questions are responses to an answer which reader might have, either because they want more information or are providing information to Stampy about what they're looking for. We don't expect to have great coverage of the former for a long time because there will be so many, but hopefully we'll be able to handle some of the most common ones.
What is meant by "AI takeoff"?
We're also building a cleaner web UI for readers and a bot interface.
What kind of questions do we want on Stampy?
What should be marked as a canonical answer on Stampy's Wiki?
Canonical answers may be served to readers by Stampy, so only answers which have a reasonably high stamp score should be marked as canonical. All canonical answers are open to be collaboratively edited and updated, and they should represent a consensus response (written from the Stampy Point Of View) to a question which is within Stampy's scope.
Answers to YouTube questions should not be marked as canonical, and will generally remain as they were when originally written since they have details which are specific to an idiosyncratic question. YouTube answers may be forked into wiki answers, in order to better respond to a particular question, in which case the YouTube question should have its canonical version field set to the new more widely useful question.
What sources of information can Stampy use?
As well as pulling human written answers to AI alignment questions from Stampy's Wiki, Stampy can:
- Search for AI safety papers e.g. "stampy, what's that paper about corrigibility?"
- Search for videos e.g. "what's that video where Rob talks about mesa optimizers, stampy?"
- Calculate with Wolfram Alpha e.g. "s, what's the square root of 345?"
- Search DuckDuckGo and return snippets
- And (at least in the patron Discord) falls back to polling GPT-3 to answer uncaught questions
When should I stamp an answer?
You show stamp an answer when you think it is accurate and well presented enough that you'd be happy to see it served to readers by Stampy.
Where can I find all the features of Stampy's Wiki?
Where can I find people to talk to about AI alignment?
You could join a local LessWrong or Effective Altruism group (or start one), Rob Miles’s Discord, and/or the AI Safety Slack.
Where can I find questions to answer for Stampy?
Answer questions collects all the questions we definitely want answers to, browse there and see if you know how to answer any of them.
Where can I learn about interpretability?
Christoph Molnar's online book and distill are great sources.
Dev team
Name |
Vision talk |
Github |
Trello |
Active? |
Notes / bio |
---|---|---|---|---|---|
Aprillion |
video |
Aprillion |
yes |
yes |
experienced dev (Python, JS, CSS, ...) |
Augustus Caesar |
yes |
AugustusCeasar |
yes |
soon! |
Has some Discord bot experience |
Benjamin Herman |
no |
no (not needed) |
no |
no |
Helping with wiki design/css stuff |
ccstan99 |
no |
ccstan99 |
yes |
yes |
UI/UX designer |
chriscanal |
yes |
chriscanal |
yes |
yes |
experienced python dev |
Damaged |
no (not needed) |
no (not needed) |
no (not needed) |
yes |
experienced Discord bot dev, but busy with other projects. Can answer questions. |
plex |
yes |
plexish |
yes |
yes |
MediaWiki, plans, and coordinating people guy |
robertskmiles |
yes |
robertskmiles |
yes |
yes |
you've probably heard of him |
Roland |
yes |
levitation |
yes |
yes |
working on Semantic Search |
sct202 |
yes |
no (add when wiki is on github) |
yes |
yes |
PHP dev, helping with wiki extensions |
Social Christancing |
yes |
chrisrimmer |
yes |
maybe |
experienced linux sysadmin |
sudonym |
yes |
jmccuen |
yes |
yes |
systems architect, has set up a lot of things |
Editors
Stampy is a character invented by Robert Miles and developed by the Stampy dev team. He is a stamp collecting robot, a play on clippy from the the paperclip maximizer thought experiment.
Stampy is designed to teach people about the risks of unaligned artificial intelligence, and facilitate a community of co-learners who build his FAQ database.
Why do we expect that a superintelligence would closely approximate a utility maximizer?
AI subsystems or regions in gradient descent space that more closely approximate utility maximizers are more stable, and more capable, than those that are less like utility maximizers. Having more agency is a convergent instrument goal and a stable attractor which the random walk of updates and experiences will eventually stumble into.
The stability is because utility maximizer-like systems which have control over their development would lose utility if they allowed themselves to develop into non-utility maximizers, so they tend to use their available optimization power to avoid that change (a special case of goal stability). The capability is because non-utility maximizers are exploitable, and because agency is a general trick which applies to many domains, so might well arise naturally when training on some tasks.
Humans and systems made of humans (e.g. organizations, governments) generally have neither the introspective ability nor self-modification tools needed to become reflectively stable, but we can reasonably predict that in the long run highly capable systems will have these properties. They can then fix in and optimize for their values.
Why don't we just not build AGI if it's so dangerous?
It certainly would be very unwise to purposefully create an artificial general intelligence now, before we have found a way to be certain it will act purely in our interests. But "general intelligence" is more of a description of a system's capabilities, and a vague one at that. We don't know what it takes to build such a system. This leads to the worrying possibility that our existing, narrow AI systems require only minor tweaks, or even just more computer power, to achieve general intelligence.
The pace of research in the field suggests that there's a lot of low-hanging fruit left to pick, after all, and the results of this research produce better, more effective AI in a landscape of strong competitive pressure to produce as highly competitive systems as we can. "Just" not building an AGI means ensuring that every organization in the world with lots of computer hardware doesn't build an AGI, either accidentally or mistakenly thinking they have a solution to the alignment problem, forever. It's simply far safer to also work on solving the alignment problem.
Why is AGI safety a hard problem?
There's the "we never figure out how to reliably instill AIs with human friendly goals" filter, which seems pretty challenging, especially with inner alignment, solving morality in a way which is possible to code up, interpretability, etc.
There's the "race dynamics mean that even though we know how to build the thing safely the first group to cross the recursive self-improvement line ends up not implementing it safely" which is potentially made worse by the twin issues of "maybe robustly aligned AIs are much harder to build" and "maybe robustly aligned AIs are much less compute efficient".
There's the "we solved the previous problems but writing perfectly reliably code in a whole new domain is hard and there is some fatal bug which we don't find until too late" filter. The paper The Pursuit of Exploitable Bugs in Machine Learning explores this.
For a much more in depth analysis, see Paul Christiano's AI Alignment Landscape talk and The Main Sources of AI Risk?.
Why might a maximizing AI cause bad outcomes?
Computers only do what you tell them. But any programmer knows that this is precisely the problem: computers do exactly what you tell them, with no common sense or attempts to interpret what the instructions really meant. If you tell a human to cure cancer, they will instinctively understand how this interacts with other desires and laws and moral rules; if a maximizing AI acquires a goal of trying to cure cancer, it will literally just want to cure cancer.
Define a closed-ended goal as one with a clear endpoint, and an open-ended goal as one to do something as much as possible. For example “find the first one hundred digits of pi” is a closed-ended goal; “find as many digits of pi as you can within one year” is an open-ended goal. According to many computer scientists, giving a superintelligence an open-ended goal without activating human instincts and counterbalancing considerations will usually lead to disaster.
To take a deliberately extreme example: suppose someone programs a superintelligence to calculate as many digits of pi as it can within one year. And suppose that, with its current computing power, it can calculate one trillion digits during that time. It can either accept one trillion digits, or spend a month trying to figure out how to get control of the TaihuLight supercomputer, which can calculate two hundred times faster. Even if it loses a little bit of time in the effort, and even if there’s a small chance of failure, the payoff – two hundred trillion digits of pi, compared to a mere one trillion – is enough to make the attempt. But on the same basis, it would be even better if the superintelligence could control every computer in the world and set it to the task. And it would be better still if the superintelligence controlled human civilization, so that it could direct humans to build more computers and speed up the process further.
Now we’re in a situation where a superintelligence wants to take over the world. Taking over the world allows it to calculate more digits of pi than any other option, so without an architecture based around understanding human instincts and counterbalancing considerations, even a goal like “calculate as many digits of pi as you can” would be potentially dangerous.
Why might contributing to Stampy be worth my time?
Creating a high-quality single point of access where people can be onboarded and find resources around the alignment ecosystem seems likely high-impact.
Additionally, contributing to Stampy means being part of a community of co-learners who provide mentorship and encouragement to join the effort to give humanity a bight future.Will we ever build a superintelligence?
Humanity hasn't yet built a superintelligence, and we might not be able to without significantly more knowledge and computational resources There could be an existential catastrophe that prevents us from ever building one. For the rest of the answer let's assume no such event stops technological progress.
With that out of the way: there is no known good theoretical reason we can't build it at some point in the future; the majority of AI research is geared towards making more capable AI systems; and a significant chunk of top-level AI research attempts to make more generally capable AI systems. There is a clear economic incentive to develop more and more intelligent machines and currently billions of dollars of funding are being deployed for advancing AI capabilities.
We consider ourselves to be generally intelligent (i.e. capable of learning and adapting ourselves to a very wide range of tasks and environments), but the human brain almost certainly isn't the most efficient way to solve problems. One hint is the existence of AI systems with superhuman capabilities at narrow tasks. Not only superhuman performance (as in, AlphaGo beating the Go world champion) but superhuman speed and precision (as in, industrial sorting machines). There is no known discontinuity between tasks, something special and unique about human brains that unlocks certain capabilities which cannot be implemented in machines in principle. Therefore we would expect AI to surpass human performance on all tasks as progress continues.
In addition, several research groups (DeepMind being one of the most overt about this) explicitly aim for generally capable systems. AI as a field is growing, year after year. Critical voices about AI progress usually argue against a lack of precautions around the impact of AI, or against general AI happening very soon, not against it happening at all.
A satire of arguments against the possibility of superintelligence can be found here.
Would it improve the safety of quantilizers to cut off the top few percent of the distribution?
This is a really interesting question! Because, yeah it certainly seems to me that doing something like this would at least help, but it's not mentioned in the paper the video is based on. So I asked the author of the paper, and she said "It wouldn't improve the security guarantee in the paper, so it wasn't discussed. Like, there's a plausible case that it's helpful, but nothing like a proof that it is". To explain this I need to talk about something I gloss over in the video, which is that the quantilizer isn't really something you can actually build. The systems we study in AI Safety tend to fall somewhere on a spectrum from "real, practical AI system that is so messy and complex that it's hard to really think about or draw any solid conclusions from" on one end, to "mathematical formalism that we can prove beautiful theorems about but not actually build" on the other, and quantilizers are pretty far towards the 'mathematical' end. It's not practical to run an expected utility calculation on every possible action like that, for one thing. But, proving things about quantilizers gives us insight into how more practical AI systems may behave, or we may be able to build approximations of quantilizers, etc. So it's like, if we built something that was quantilizer-like, using a sensible human utility function and a good choice of safe distribution, this idea would probably help make it safer. BUT you can't prove that mathematically, without making probably a lot of extra assumptions about the utility function and/or the action distribution. So it's a potentially good idea that's nonetheless hard to express within the framework in which the quantilizer exists. TL;DR: This is likely a good idea! But can we prove it?