If you think an answer is good (i.e. accurate, helpful, and replies well to the question) then please give it a stamp of approval. If you're a more experienced editor and have a good feel for when there is consensus around an answer being accepted, feel free to mark answers as canonical, so that Stampy will start serving them to users.
- Non-canonical answers to canonical questions
- Recent answers
- Canonical answers with low stamps
- Potentially canonical answers
These pages track 252 answers which need review (with some being counted multiple times).
These 50 non-canonical answers are answering canonical questions.
Metaphilosophy is the philosophy of philosophy.
Bostrom has described the AI Safety problem as "philosophy with a deadline". Metaphilosophy could be used to steer an AGI towards e.g. our coherent extrapolated volition.
There is currently no clear win condition that most/all researchers agree on. Many researchers have their own paradigm and view the problem from a different angle.
In an ideal world, we would simultaneously solve
Here are some of the sub-fields on AI Safety research. We need to solve the challenges in many of these fields to win.
- Inner alignment
- Value learning
- Agent foundations, including:
See also Concrete Problems in AI Safety.
In order of smallest commitment to largest:
- Link your friends to Stampy or Rob's videos
- Contribute to an existing AI Safety community
- Join or start a local AI Safety group at a university
- Get good at giving an elevator pitch
- Develop a deep understanding of AI Safety so you can become a competent advocate and have good conversations with people who are new or skeptical to the subject
Whilst this is posible, AI technologies seem to be progressing much faster than cognitive enhancement.
The future of the world will be dominated by these systems. We control the world because we're the most capable and coordinated entities on the planet.
Yes you can! You can check out AI Safety Support's resources for examples on the format this could take.
If you get good reviews and actually help these researchers, you might eventually get funded by external organisations.
A good list can be found on the Alignment Forum's tag list.
There is no broad consensus as to what constitutes suffering e.g. humans and animals can suffer and bacteria probably cannot, but the delimitation between entities that do and do not suffer is not clear.
Most researchers (but not all) believe that current AIs cannot suffer, but this might not hold as the capabilities of the AIs increase, especially if they are created for instance through whole brain emulation.
A related question is what Nick Bostrom calls mindcrime where an AI could fully simulate a large number of moral entities such as humans and make them suffer. Bostrom suggests this could be a moral catastrophe and precautions should be taken to avoid this situation.
There is a Cambrian explosion of approaches to solving alignment. Click through to the follow-up questions to explore the research directions of groups and individuals in the field.
Though we have tried to be exhaustive, there are certainly many people working on technical AI alignment not included in this overview. While these groups might produce great research, we either 1) didn't know enough about it to summarize or 2) weren’t aware that it was aimed at reducing x-risk from AI.
Below is a list of some of these, though we probably missed some ehre. Please feel free to add comments and I will add others.
- Future of Life Institute (FLI) (Though they seem to mostly give out grants)
- A number of academics:
- Principles of Intelligent Behavior in Biological and Social Systems (PIBSS)
- Alignment of Complex Systems Research Group
There are two parts to that answer.
Firstly: By working on the right things. Every generation since the dawn of humanity had it's Einstein-level geniuses. And yet, most of them were forgotten by history because they just didn't run into an important problem to solve.
Secondly: There are a number of useful resources for getting more productive on the internet. Some leads you might find useful:
- 80.000 hours published an article with an extensive list of evidence-backed strategies for becoming better at any job. Start at the top, and work your way down until you find something that makes sense for you to implement.
- For general problem-solving, the toolbox taught by CFAR (Center for Applied Rationality) has proven useful to many members of the alignment community. There are two sequences on LessWrong written as self-study guides for the CFAR tools: Hammertime, Training Regime.
- Keep in mind that no productivity advice whatsoever works for everyone. Something might be useful for 50% of the population, or even 99%, and still leave you worse off if you try to implement it. Experiment, iterate, and above all: Trust your own judgment.
There are three major approaches to normative ethics (and some approaches to unify two or all of them): Virtue ethics, deontological ethics, and consequentialist ethics.
Virtue ethicists believe that at the core, leading an ethical life means cultivating virtues. In other words: What counts is less what one does moment-to-moment, but that one makes an effort to become the kind of person who habitually acts appropriately in all kinds of different situations. A prominent example for virtue ethics is stoicism.
Deontological ethicists believe that an ethical life is all about following certain behavioral rules, regardless of the consequences. Prominent examples include the ten commandments in Christianity, Kant's "categorical imperative" in philosophy, or Asimov's Three Laws of Robotics in science fiction.
Consequentialist ethicists believe that nor one's character neither the rules one lives by are what makes actions good or bad. Instead, consequentialists believe that only the consequences of an action count, both direct and indirect ones. A prominent example of consequentialist ethics is utilitarianism: The notion that those actions are the most moral that lead to the greatest good for the greatest number of individuals.
The short answer to the question which one of these might be the easiest to encode into an AI is: "We don't know." However, machine learning agents optimize for consequences, not virtues or hard-coded rules. As all the likely roads towards AGI involve machine learning, consequentialism may be the ethical theory to stick closest to.
The key concern in regards to AGI is that if it passes human-level intelligence, it would likely become uncontrollable and we essentially hand our dominant position on the planet over to it. Whether the first human-level AI is deployed by terrorists, a government, or a major research organization does not make any difference for that fact. While the latter two might have more interest in deploying aligned AGI than terrorists, they won't be able to do that unless we solve the alignment problem.
As far as narrow AI is concerned: The danger of misuse by bad actors is indeed a problem. As the capabilities of narrow AI systems grow while we get closer to AGI, this problem will only grow more and more severe over the next years and decades.
However, leading experts expect that we are more than 50% likely to reach human-level AI by the end of this century. On the forecasting platform Metaculus, the current (September 2022) median forecast is as early as 2043.
Accordingly, we have no time to lose for solving the alignment problem, with or without the danger of terrorists using narrow AI systems.
Perhaps an ai which is aligned with the values of humanity as a whole may decide that it's worth "uploading" someone contrary to their will if it better serve humanity to have us all uploaded than not to, but it seems likely that most people do not wish to be "uploaded" (See Nozick's experience machine) and so an aligned superintelligence would not do this.
A misaligned superintelligence would have no qualm with destroying us entirely in order to use the matter of our bodies, and it most likely would not bother to "upload" us since it is misaligned.
It's hard to know what politicians are concerned with behind closed doors, as they will only publicly talk about things which serve their political interests, and talking about AIs destroying the world is a good way to sound crazy and to lose a lot of votes, so they won't do that.
I did search "politicians concerned about ai risk uk" on my browser and got a message saying "Not many great matches came back for your search politicians concerned about ai risk uk", which shows that if politicians are worried about this, they don't tend to talk about it.
It's very hard to know the answer to this in advance because a superintelligent AI is capable of doing anything that it understands how to do. That sounds obvious, but it has a hidden nuance there - it knows how to do things that we don't know how to do, and so we cannot predict in advance what it will do because it will do some things that we didn't even know were possible to do. Almost by definition, that makes it impossible to reliably predict.
We can say that it will not be able to do things which are actually not possible to do, and it seems likely that certain rules are always followed (energy must be conserved, global entropy must always increase etc.) but the problem is that we have been wrong about the laws of physics before (see Newton's laws vs relativity and quantum mechanics) and so we're only *mostly* certain that these things are true. If there's any discrepancy between how we think physics works and how physics actually works (and most likely, such a discrepancy does exist somewhere), we can expect a superintelligent AI to be able to exploit that discrepancy in order to serve its goals in lots of weird ways we couldn't possibly have predicted.
We may be able to say with 99% certainty that a superintelligent AI will not be able to violate conservation of energy, but we're not 100% sure, and we can't be 100% sure of anything here.
Alignment failure, at its core, is any time an AI's output deviates from what we intended. We have already witnessed alignment failure in simple AIs. Mostly, these amount to correlation being equated to causation. A good example was an AI built by Youtube to recognize animals being forced to fight for sport. The videos given to the AI were always set in some kind of arena, so the AI drew the simplest conclusion and matched videos where there were similar arenas—such as with robot combat tournaments.
A lot of narrow AI alignment advances will improve capabilities too, make it easier for human users to work with the AI tools. Those are going to be adopted almost instantly, for example interpretability might be considered a desirable property by all AI researchers.
However, with our current research methods for search in highly dimensional spaces, it seems exponentially more likely to find a capable AGI than to find a capable and aligned AGI. So even if capabilities research will adopt all new advances in AI alignment as soon as they come along, it is likely capability research will happen faster than alignment research. We need to find ways to incentivize the corporations who will would profit from more capability research to also focus on alignment.
By its very nature, a superintelligence could perform tasks we can't even imagine. If properly aligned, we could ask it to cure cancer, lower the crime rate in the world to zero, or end world hunger. A superintelligence would be able to innovate beyond our current understanding. It could even be given a task to do something as general as keeping humans safe. That superintelligence would try and find threats to us and handle them appropriately. As a last resort, if the superintelligence feels outmatched, it could develop an even more powerful superintelligence to help.
All these things would be possible and more. However, we would likely want to ensure the superintelligence is interpretable; otherwise, we could never truly trust the system is aligned with our interests.
Being able to prove alignment of a potential AGI to any objective expressed in human language seems to be an important stepping stone towards AGI alignment, even if it would be such a controversial objective as hedonium maximizing, which most people find undesirable.
Since hedonium maximization might be easier to model mathematically than more complex (and more desirable) objectives, it might be easier to optimize for. And development of optimization techniques with provable alignment might generalize to help us optimize for desirable objectives too.
Emulated minds have the same behavior as conventional minds. This being so, they can do anything the human mind they are emulating can. Provided the mind that is being emulated is capable of learning how to do alignment research, the emulation of them would be, too. It should be noted that we do not currently have the technology to emulate human minds.
Let's call the AI in the box the prisoner and the AI outside the guard. You could imagine such AIs boxed liked nesting dolls but there would always be a highest level where the guard AI is boxed by humans. This being the case, it is not clear what such a design buys you, as all the issues of containing an AI will still apply at this top level.
A pivotal act is an unspecified action that drastically improves the strategic situation such that successful alignment becomes significantly more probable.
So, let’s first ignore superintelligence and consider the case of AIs that are merely perfect replacements for all human labour.
Many economists dismiss the claim that automation can cause general unemployment, and will often mention the “Lump of labour” fallacy, which is the idea that there is a finite amount of jobs in the world, which automation will slowly winnow away. They note that though automation has caused unemployment in particular sectors, increased efficiency has freed up resources which can be used to employ more humans in those areas where machines cannot yet replace their labour. And historically, this has more than made up for the jobs that were eliminated.
However, AI is different than other sorts of automation in that it is of general applicability. If we consider these AIs perfect replacements for human labour then standard labour models predict human wages will decline until they are competitive with the cost of running an AI. If this cost is below subsistence, then this would cause unemployment. However, in scenarios where some humans still have capital, they may prefer human workers for signalling or other reasons even if AIs are better and cheaper.
Now that we’ve talked about perfect replacements for human labour, we can talk about superintelligence. A superintelligence would quickly acquire material power, and we think superior material power to that held by any human or collection of humans. At that point, thinking in terms of employment is likely beside the point. The post-superintelligence world will reflect the preferences of the AI/AIs. If it prefers humans exist, we will. If it prefers we have jobs, we will.
Governance - e.g. By establishing best practises, institutions & processes, awareness, regulation, certification, etc?
Perhaps. There is a chance that directly lobbying politicians could help, but there's also a chance that actions end up being net-negative. It would be great if we could slow down AI, but doing so might simple mean that a nation less concerned about safety produces AI first. We could ask them to pass regulations or standards related to AGI, but passing ineffective regulation might interfere with passing more effective regulation later down the track as people may consider the issue dealt with. Or the requirements of complying with bureaucracy might prove to be a distraction from safe AI.
If you are concerned about this issue, you should probably try learning as much about this issue as possible and also spend a lot of time brainstorming downside risks and seeing what risks other people have identified.
Working out milestone tasks that we expect to be achieved before we reach AGI can be difficult. Some tasks, like "continuous learning" intuitively seem like they will need to be solved before someone builds AGI. Continuous learning is learning bit by bit, as you get more data. Current ML systems usually don't do this, instead learning everything at once from a big dataset. Because humans can do continuous learning, it seems like it might be required for AGI. However, you have to be careful with reasoning like this, because it is possible the first generally capable artificial intelligence will work quite differently to a human. It's possible the first AGI will be designed to avoid needing "continuous learning", maybe by being designed to do a big retraining process every day. This might still allow it to be as capable as humans at almost every task, but without solving the "continuous learning" problem.
Because of arguments like the above, it's not always clear whether a given task is "required" for AGI.
Some potential big milestone tasks might be:
- ARC challenge (tests the ability to generate the "simplest explanation" for patterns)
- Human level sample efficiency at various tasks (EfficientZero already does Atari games)
This metaculus question has four very specific milestones that it considers to be requirements for "weak AGI".
There are multiple programmes you can apply to if you want to try becoming a researcher. If accepted to these programs, you will get funding and mentorship. Some examples of these programs are: SERI summer research fellowship, CERI summer research fellowship, SERI ML Alignment Theory Program, and more. A lot of these programs run during specific times of the year (specifically during the summer).
Other examples of things you can do are: join the next iteration of the AGI Safety Fundamentals programme (https://www.eacambridge.org/technical-alignment-curriculum), if you're thinking of a career as a researcher working on AI safety questions you can get 1-1 career advice from 80,000 Hours (https://80000hours.org/speak-with-us), you can apply to attend an EAGx or EAG conference (https://www.eaglobal.org/events/) where you can meet in-person with researchers working on these questions so you can directly ask them for advice.
Some of these resources might be helpful: https://www.aisafetysupport.org/resources/lots-of-links
It's not completely clear exactly what 'merging' with AI would imply, but it doesn't seem like a way to get around the alignment problem. If the AI system is aligned, and wants to do what humans want, then having direct access to human brains could provide a lot of information about human values and goals very quickly and efficiently, and thus be helpful for better alignment. Although, a smart AI system could also get almost all of this information without a brain-computer interface, through conversation, observation etc, though much slower. On the other hand if the system is not aligned, and doesn't fundamentally want humans to get what we want, then extra information about how human minds work doesn't help and only makes the problem worse. Allowing a misaligned AGI direct access to your brain hardware is a bad idea for obvious reasons.
Preventing an AI from escaping by using a more powerful AI, gets points for creative thinking, but unfortunately we would need to have already aligned the first AI. Even if the second AI's only terminal goal were to prevent the first ai from escaping, it would also have an instrumental goal of converting the rest of the universe into computer chips so that it would have more processing power to figure out how to best contain the first AGI.
It might be possible to try to bind a stronger AI with a weaker AI, but this is unlikely to work as the stronger AI would have an advantage due to being stronger. Further, there is a chance that the two AI's end up working out a deal where the first AI decides to stay in the box and the second AI does whatever the first AI would have down if it were able to escape.
One of the main questions about simulation theory is why would a society invest a large quantity of resources to create it. One possible answer is an environment to train/test AI, or run it safely isolated from an outside reality.
It's a fun question but probably not one worth thinking about too much. This kind of question is impossible to get information from observations and experiments.
I think an AI inner aligned to optimize a utility function of maximize happiness minus suffering is likely to do something like this.
Inner aligned meaning the AI is trying to do the thing we trained it to do. Whether this is what we actually want or not.
"Aligned to what" is the outer alignment problem which is where the failure in this example is. There is a lot of debate on what utility functions are safe or desirable to maximize, and if human values can even be described by a utility function.
Autonomous weapons, especially a nuclear arsenal, being used by an AI is a concern, but this seems downstream of the central problem of giving an unaligned AI any capabilities to impact the world.
Triggering nuclear war is only one of many ways a power seeking AI might choose to take control. This seems unlikely, as resources the AI would want to control (or the AI itself) would likely be destroyed in the process.
This depends on what the superintelligence in question wants to happen. If AIs want humans to continue being employable, they’ll act to ensure humans remain employable by setting up roles that only biological humans can fill, artificially perpetuating the need for employing humans.
Optimistic views might hold that it is possible to coordinate between all AI creators to align their AIs only with a central agreed-upon definition of "human values," which could be determined by traditional human political organizations. Succeeding at this coordination would prevent (or at least, reduce) the weaponization of AIs toward competition between these values.
More pessimistic views hold that this coordination is unlikely to succeed, and that just as today different definitions of "human values" compete with one another (through e.g. political conflicts), AIs will likely be constructed by actors with different values and will compete with one another on the same grounds. The exception being that this competition might end if one group gains enough advantage to carry out a Pivotal Act that can "lock-in" their set of values as winner.
We could imagine a good instance of this might look like a U.N.-sanctioned project constructing the first super-intelligent AI, successfully aligned with the human values roughly defined as "global peace and development". This AI might then perform countermeasures to reduce the influence of bad AIs by e.g. regulating further AI development, or seizing compute power from agencies developing bad AIs.
Bad outcomes might look similar to the above, but with AIs developed by extremists or terrorists taking over. Worse still would be a careless development group accidentally producing a maligned AI, where we don't end up with "bad human values" (like one of the more oppressive human moralities), we just end up with "non-human values" (like where only paperclips matter).
A common concern is that if a friendly AI doesn't carry this out, then an opposition AI is likely to do so. Hence, there is a relatively common view that safe AI not only must be developed, but must be deployed to prevent possibly hostile AIs from arising.There are also arguments against "Pivot Act" mentality which promote political regulation as a better path toward friendly AI than leaving the responsibility to the first firm to finish.
Questions are: (i) contributed by online users or Stampy editors via the Stampy Wiki; or (ii) scraped from online content (various AI-alignment-related FAQs as well as the comments sections of certain AI Alignment YouTube videos).
The scraped content is currently a secondary concern, but this crude process of aggregation will eventually be streamlined into a reliable source of human-editable questions and answers.
Questions are reviewed by Stampy editors, who decide if: (i) they're duplicates of existing questions (the criterion being that the answer to the existing question would be fully satisfactory to the asker of the new question); (ii) they're sufficiently within the scope of the Stampy project.
We are working on using semantic search to suggest possible duplicates.
If they meet these two criterion, questions are added to a list of canonical questions.
A rating system allows editors to assign quality levels "Meh"/"Unreviewed"/"Approved"/"Good"/"Excellent" in order the questions on Answer questions, so that the most important questions can be worked on first.
Answers to canonical questions can be contributed via the Stampy Wiki by online users or by Stampy editors directly, at which point the question is added to a list of "answered canonical questions"
Editors can attempt to improve a contributed answer, and/or can "stamp" it to indicate their approval, adding to its "stamp score".
Once the answer to a canonical question gets a sufficiently high stamp score it gets added to a list of canonical answers (to canonical questions).
These canonical question/answer pairs are then ready to be served to the user interface. In order for them to become visible there, though, they must be associated with existing canonical question/answer pairs in one of two ways: RELATED or FOLLOWUP. Any editor can improve these relationships, either based on tags or their own understanding of what a reader might want to know. Questions should aim to have 2-5 related + followups generally, although exceptions can be made.
If Questions B is RELATED to Question A, it will slide in below Question A on the UI page when Question A is clicked on, provided that if it is not already present on the page.
If Question B is FOLLOWUP to Question A, it will always slide in below Question A when Question A is clicked on, even if it is already present on the UI page.
A and B being RELATED questions can be thought of as a kind of conceptual adjacency. If a user is interested to know the answer to A, they'll probably be interested in the answer to B too, and vice versa. Reading these in either order should make roughly the same amount of sense to the average user.
Question B being FOLLOWUP to Question A can be thought of in terms of progressive knowledge: the answer to B will only really make sense to the average user if they have read the answer to A first. This is also used for letting Stampy ask clarifying questions to direct readers to the right part of his knowledge graph.
If you click on "Edit answer", then "[Show Advanced Options]", you'll be given the option to submit a brief version of you answer (this field will be automatically filled if the full answer exceeds 2000 characters).
Link to https://www.cold-takes.com/ai-could-defeat-all-of-us-combined/ at the end
There are debates about how discontinuous an intelligence explosion would be, with Paul Christiano expecting to see the world being transformed by less and less weak AGIs over some number of years, while Eliezer Yudkowsky expects a rapid jump in capabilities once generality is achieved and the self-improvement process is able to sustain itself.
Codex / Github Copilot are AIs that use GPT-3 to write and edit code. When given some input code and comments describing the intended function, they will write output that extends the prompt as accurately as possible.
"The real concern" isn't a particularly meaningful concept here. Deep learning has proven to be a very powerful technology, with far reaching implications across a number of aspects of human existence. There are significant benefits to be found if we manage the technology properly, but that management means addressing a broad range of concerns, one of which is the alignment problem.
Whole Brain Emulation (WBE) or ‘mind uploading’ is a computer emulation of all the cells and connections in a human brain. So even if the underlying principles of general intelligence prove difficult to discover, we might still emulate an entire human brain and make it run at a million times its normal speed (computer circuits communicate much faster than neurons do). Such a WBE could do more thinking in one second than a normal human can in 31 years. So this would not lead immediately to smarter-than-human intelligence, but it would lead to faster-than-human intelligence. A WBE could be backed up (leading to a kind of immortality), and it could be copied so that hundreds or millions of WBEs could work on separate problems in parallel. If WBEs are created, they may therefore be able to solve scientific problems far more rapidly than ordinary humans, accelerating further technological progress.
Until AI doesn't exceed human capabilities, we could do that.
But there is no reason why AI capabilities would stop at the human level. Systems more intelligent than us, could think of several ways to outsmart us, so our best bet is to have them as closely aligned to our values as possible.
The problem is that the actions can be harmful in a very non-obvious, indirect way. It's not at all obvious which actions should be stopped.
For example when the system comes up with a very clever way to acquire resources - this action's safety depends on what it intends to use these resources for.
Such a supervision may buy us some safety, if we find a way to make the system's intentions very transparent.
Verified accounts are given to people who have clearly demonstrated understanding of AI Safety outside of this project, such as by being employed and vouched for by a major AI Safety organization or by producing high-impact research. Verified accounts may freely mark answers as canonical or not, regardless of how many Stamps the person has, to determine whether those answers are used by Stampy.
This depends on how we will program it. It definitely can be autonomous, even now, we have some autonomous vehicles or flight control systems and many more.
Even though it's possible to build such systems, it may be better if they actively ask humans for supervision, for example in cases where they are uncertain what to do.
Nobody knows for sure when we will have ASI or if it is even possible. Predictions on AI timelines are notoriously variable, but recent surveys about the arrival of human-level AGI have median dates between 2040 and 2050 although the median for (optimistic) AGI researchers and futurists is in the early 2030s (source). What will happen if/when we are able to build human-level AGI is a point of major contention among experts. One survey asked (mostly) experts to estimate the likelihood that it would take less than 2 or 30 years for a human-level AI to improve to greatly surpass all humans in most professions. Median answers were 10% for "within 2 years" and 75% for "within 30 years". We know little about the limits of intelligence and whether increasing it will follow the law of accelerating or diminishing returns. Of particular interest to the control problem is the fast or hard takeoff scenario. It has been argued that the increase from a relatively harmless level of intelligence to a dangerous vastly superhuman level might be possible in a matter of seconds, minutes or hours: too fast for human controllers to stop it before they know what's happening. Moving from human to superhuman level might be as simple as adding computational resources, and depending on the implementation the AI might be able to quickly absorb large amounts of internet knowledge. Once we have an AI that is better at AGI design than the team that made it, the system could improve itself or create the next generation of even more intelligent AIs (which could then self-improve further or create an even more intelligent generation, and so on). If each generation can improve upon itself by a fixed or increasing percentage per time unit, we would see an exponential increase in intelligence: an intelligence explosion.
It is impossible to design an AI without a goal, because it would do nothing. Therefore, in the sense that designing the AI’s goal is a form of control, it is impossible not to control an AI. This goes for anything that you create. You have to control the design of something at least somewhat in order to create it.
There may be relevant moral questions about our future relationship with possibly sentient machine intelligent, but the priority of the Control Problem finding a way to ensure the survival and well-being of the human species.
Goal-directed behavior arises naturally when systems are trained to on an objective. AI not trained or programmed to do well by some objective function would not be good at anything, and would be useless.
These 2 answers have been added in the last month.
"Author's Notes" style meta-commentary writing is like this: [ hey im an authors note ]
[ note: this is not the answer yet, and can be seen as me helping my past self, who in fact asked this question earlier]
Conjecture's policy on infohazards: https://www.lesswrong.com/posts/Gs29k3beHiqWFZqnn/conjecture-internal-infohazard-policy#Introduction
My distillation of Conjecture's policy:
[ The "tldr" and the "Why a policy?" sections) of the below text, but with prettier colors: https://imgur.com/a/4Cha3gM ]
[ First some background: ]
Infohazard either info that: - harms the hearer or, - accelerates humanity towards agi doom
The Conjecture post is about the latter, about how to deal with info that might accelerate humanity towards AGI. (note: there is a key assumption that we have not solved alignment yet, and so, the longer it takes for artificial superintelligence (imo a better term than agi) to arrive, the more time we have to solve it)
policy TLDR: """ The TL;DR of the policy is: Mark all internal projects as explicitly secret, private, or public. Only share secret projects with selected individuals; only share private projects with selected groups; share public projects with anyone, but use discretion. When in doubt consult the project leader or the “appointed infohazard coordinator”. """
Why a policy? - trust does not scale - p(99, 95, 90) for 1 person keeping a secret == p(74, 21, 4) for 30 people keeping that secret
[ My thoughts on why this post did not exactly answer the question ]
The rest of the Conjecture post goes into things unrelated to tooling, but the takeaway for tooling is, to the extent that the tool should be kept secret (either the source code or the mere idea), you can apply things in it. Like, keep the number of sharees small, and read their best practices and stuff, which I did not read.
[ To explain the difference between my concerns and what the post is about: ]
The Conjecture policy is centered around the dynamics of when events are binary things, the events being the release of secrets. So with a secret, it being "out" or not is the only variable. And since this is about AGI-accelerating secrets, it's always bad when the secret is out. This is not so with tooling. A tool may be good or bad, it's not clear and depends on the tool.
Sharing high quality information about AI Safety can be one of the lowest effort ways to expose people to the ideas. Be sure to engage with the replies with care and do your research when replying to questions people respond with (feel free to add them to aisafety.info for our team to work on).
- Introduction to AI safety by Robert Miles
- Rational animations
- Article from Vox
For Machine Learning researchers:
- More is Different for AI by Jacob Steinhardt
- Researcher Perceptions of Current and Future AI by Vael Gates (2022)
- “Why I Think More NLP Researchers Should Engage with AI Safety Concerns by Sam Bowman (2022)
- Specific videos/playlists:
- Computerphile's series with Robert Miles
These 202 canonical answers have one or fewer stamps.
One possible way to ensure the safety of a powerful AI system is to keep it contained in a software environment. There is nothing intrinsically wrong with this procedure - keeping an AI system in a secure software environment would make it safer than letting it roam free. However, even AI systems inside software environments might not be safe enough.
Humans sometimes put dangerous humans inside boxes to limit their ability to influence the external world. Sometimes, these humans escape their boxes. The security of a prison depends on certain assumptions, which can be violated. Yoshie Shiratori reportedly escaped prison by weakening the door-frame with miso soup and dislocating his shoulders.
Human written software has a high defect rate; we should expect a perfectly secure system to be difficult to create. If humans construct a software system they think is secure, it is possible that the security relies on a false assumption. A powerful AI system could potentially learn how its hardware works and manipulate bits to send radio signals. It could fake a malfunction and attempt social engineering when the engineers look at its code. As the saying goes: in order for someone to do something we had imagined was impossible requires only that they have a better imagination.
Experimentally, humans have convinced other humans to let them out of the box. Spooky.
In principle it could (if you believe in functionalism), but it probably won't. One way to ensure that AI has human-like emotions would be to copy the way human brain works, but that's not what most AI researchers are trying to do.
It's similar to how once some people thought we will build mechanical horses to pull our vehicles, but it turned out it's much easier to build a car. AI probably doesn't need emotions or maybe even consciousness to be powerful, and the first AGIs that will get built will be the ones that are easiest to build.
Alternate phrasings are used to improve the semantic search which Stampy uses to serve people questions, by giving alternate ways to say a question which might trigger a match when the main wording won't. They should generally only be used when there is a significantly different wording, rather than for only very minor changes.
A slow takeoff is where AI capabilities improve gradually, giving us plenty of time to adapt. In a moderate takeoff we might see accelerating progress, but we still won’t be caught off guard by a dramatic change. Whereas, in a fast or hard takeoff AI would go from being not very generally competent to sufficiently superhuman to control the future too fast for humans to course correct if something goes wrong.
The article Distinguishing definitions of takeoff goes into more detail on this.
Intelligence is powerful. One might say that “Intelligence is no match for a gun, or for someone with lots of money,” but both guns and money were produced by intelligence. If not for our intelligence, humans would still be foraging the savannah for food.
Intelligence is what caused humans to dominate the planet in the blink of an eye (on evolutionary timescales). Intelligence is what allows us to eradicate diseases, and what gives us the potential to eradicate ourselves with nuclear war. Intelligence gives us superior strategic skills, superior social skills, superior economic productivity, and the power of invention.
A machine with superintelligence would be able to hack into vulnerable networks via the internet, commandeer those resources for additional computing power, take over mobile machines connected to networks connected to the internet, use them to build additional machines, perform scientific experiments to understand the world better than humans can, invent quantum computing and nanotechnology, manipulate the social world better than we can, and do whatever it can to give itself more power to achieve its goals — all at a speed much faster than humans can respond to.