Nico Hill2

From Stampy's Wiki
User:484672482016493568 / (Redirected from Nico Hill2)
Questions Asked: 134
Answers written: 3


Answers written by this user

What are the ethical challenges related to whole brain emulation?

Show your endorsement of this answer by giving it a stamp of approval!

Unless there was a way to cryptographically ensure otherwise, whoever runs the emulation has basically perfect control over their environment and can reset them to any state they were previously in. This opens up the possibility of powerful interrogation and torture of digital people.

Imperfect uploading might lead to damage that causes the EM to suffer while still remaining useful enough to be run for example as a test subject for research. We would also have greater ability to modify digital brains. Edits done for research or economic purposes might cause suffering. See this fictional piece for an exploration of how a world with a lot of EM suffering might look like.

These problems are exacerbated by the likely outcome that digital people can be run much faster than biological humans, so it would be plausibly possible to have an EM run for hundreds of subjective years in minutes or hours without having checks on the wellbeing of the EM in question.

Would AI alignment be hard with deep learning?

Show your endorsement of this answer by giving it a stamp of approval!

Ajeya Cotra has written an excellent article named Why AI alignment could be hard with modern deep learning on this question.

Will we ever build a superintelligence?

Show your endorsement of this answer by giving it a stamp of approval!

Humans provide an existence prove for the physical possiblity of intelligent systems and there are many advantages computers have (like processing speed and size) such that one would stongly expect AI systems significantly more intelligent than humans to be possible. For an implicitly joking depiction of common arguments for the impossibility of superintelligence see this article. Conditional on technological progress continuing it seems extremely likely that at some point humanity will build superintelligent machines. There is a clear economic incentive to develop more and more intelligent machines and currently billions of dollars of funding are being deployed for advancing AI capabilities. Computers are already superhuman at a variety of tasks such as arithmetic and classifying images and one would expect the number of tasks that machines are capable of performing to continue growing and lead to AI systems far more capable than humans in many domains, especially once AI starts making significant contributions to developing better AI systems. The main reason for why we might never build superintelligent AI then is that humanity went extinct before developing the techonology or stopped techonological progress for some other reason. For an analysis of existential risks which could cause such a scenario see The Precipice.


Questions by Nico Hill2 which have been answered

Are expert surveys on AI safety available?

Show your endorsement of this answer by giving it a stamp of approval!

The organisation AI Impacts did a survey of AI experts in 2016, and another in 2022.

Could we tell the AI to do what's morally right?

Show your endorsement of this answer by giving it a stamp of approval!

This suggestion is not as simple as it seems because:

  1. Humanity as a group has yet to agree on what is right or moral
  2. We currently don't know how to make an AI do what we want

Philosophers have disagreed for a very long time on what is right or wrong, which has led to the field of ethics. Within the field of AI safety, Coherent Extrapolated Volition is an attempt to solve what is the right thing to do. The complexity of values is explored in Yudkowsky's complexity of wishes.

Even if we had a well defined objective, for example a diamond maximizer, we currently do not know how to fully describe it to an AI. For more info, see Why is AGI safety a hard problem.

Does the importance of AI risk depend on caring about transhumanist utopias?

Show your endorsement of this answer by giving it a stamp of approval!

No. Misaligned artificial intelligence poses a serious threat to the continued flourishing, and maybe even continued existence, of humanity as a whole. While predictions about when artificial general intelligence may be achieved vary, surveys consistently report a >50% probability of achieving general AI before the year 2060 - within the expected lifetimes of most people alive today.

It is difficult to predict how technology will develop, and at what speed, in the years ahead; but as artificial intelligence poses a not-insignificant chance of causing worldwide disaster within the not-too-distant future, anyone who is generally concerned with the future of humanity has reason to be interested.

How can I convince others and present the arguments well?

Show your endorsement of this answer by giving it a stamp of approval!

Things Skeptics Commonly Say, and links to refutations goes over most of the common objections, with some of the ways in which each is not fatal to the AI x-risk arguments.

Vael Gates's project links to lots of example transcripts of persuading senior AI capabilities researchers.

How difficult should we expect alignment to be?

Show your endorsement of this answer by giving it a stamp of approval!

Here we ask about the additional cost of building an aligned powerful system, compare to its unaligned version. We often assume it to be nonzero, in the same way it's easier and cheaper to build an elevator without emergency brakes. This is referred as the alignment tax, and most AI alignment research is geared toward reducing it.

One operational guess by Eliezer Yudkowsky about its magnitude is "[an aligned project will take] at least 50% longer serial time to complete than [its unaligned version], or two years longer, whichever is less". This holds for agents with enough capability that their behavior is qualitatively different from a safety engineering perspective (for instance, an agent that is not corrigible by default).

An essay by John Wentworth argues for a small chance of alignment happening "by default", with an alignment tax of effectively zero.

How do I form my own views about AI safety?

Show your endorsement of this answer by giving it a stamp of approval!

As with most things, the best way to form your views on AI safety is to read up on the various ideas and opinions that knowledgeable people in the field have, and to compare them and form your own perspective. There are several good places to start. One of them is the Machine Intelligence Research Institute`s "Why AI safety?" info page. The article contains links to relevant research. The Effective Altruism Forum has an article called "How I formed my own views on AI safety", which could also be pretty helpful. Here is a Robert Miles youtube video that can be a good place to start as well. Otherwise, there are various articles about it, like this one, from Vox.

How fast will AI takeoff be?

Show your endorsement of this answer by giving it a stamp of approval!

There is significant controversy on how quickly AI will grow into a superintelligence. The Alignment Forum tag has many views on how things might unfold, where the probabilities of a soft (happening over years/decades) takeoff and a hard (happening in months, or less) takeoff are discussed.

How might things go wrong with AI even without an agentic superintelligence?

Show your endorsement of this answer by giving it a stamp of approval!

Failures can happen with narrow non-agentic systems, mostly from humans not anticipating safety-relevant decisions made too quickly to react, much like in the 2010 flash crash.

A helpful metaphor draws on self-driving cars. By relying more and more on an automated process to make decisions, people become worse drivers as they’re not training themselves to react to the unexpected; then the unexpected happens, the software system itself reacts in an unsafe way and the human is too slow to regain control.

This generalizes to broader tasks. A human using a powerful system to make better decisions (say, as the CEO of a company) might not understand those very well, get trapped into an equilibrium without realizing it and essentially losing control over the entire process.

More detailed examples in this vein are described by Paul Christiano in What failure looks like.

Another source of failures is AI-mediated stable totalitarianism. The limiting factor in current pervasive surveillance, police and armed forces is manpower; the use of drones and other automated tools decreases the need for personnel to ensure security and extract resources.

As capabilities improve, political dissent could become impossible, checks and balances would break down as a minimal number of key actors is needed to stay in power.

I want to help out AI alignment without necessarily making major life changes. What are some simple things I can do to contribute?

Show your endorsement of this answer by giving it a stamp of approval!

OK, it’s great that you want to help, here are some ideas for ways you could do so without making a huge commitment:

  • Learning more about AI alignment will provide you with good foundations for any path towards helping. You could start by absorbing content (e.g. books, videos, posts), and thinking about challenges or possible solutions.
  • Getting involved with the movement by joining a local Effective Altruism or LessWrong group, Rob Miles’s Discord, and/or the AI Safety Slack is a great way to find friends who are interested and will help you stay motivated.
  • Donating to organizations or individuals working on AI alignment, possibly via a donor lottery or the Long Term Future Fund, can be a great way to provide support.
  • Writing or improving answers on my wiki so that other people can learn about AI alignment more easily is a great way to dip your toe into contributing. You can always ask on the Discord for feedback on things you write.
  • Getting good at giving an AI alignment elevator pitch, and sharing it with people who may be valuable to have working on the problem can make a big difference. However you should avoid putting them off the topic by presenting it in a way which causes them to dismiss it as sci-fi (dos and don’ts in the elevator pitch follow-up question).
  • Writing thoughtful comments on AI posts on LessWrong.
  • Participating in the AGI Safety Fundamentals program – either the AI alignment or governance track – and then facilitating discussions for it in the following round. The program involves nine weeks of content, with about two hours of readings + exercises per week and 1.5 hours of discussion, followed by four weeks to work on an independent project. As a facilitator, you'll be helping others learn about AI safety in-depth, many of whom are considering a career in AI safety. In the early 2022 round, facilitators were offered a stipend, and this seems likely to be the case for future rounds as well! You can learn more about facilitating in this post from December 2021.

If I only care about helping people alive today, does AI safety still matter?

Show your endorsement of this answer by giving it a stamp of approval!

This largely depends on when you think AI will be advanced enough to constitute an immediate threat to humanity. This is difficult to estimate, but the field is surveyed at How long will it be until transformative AI is created?, which comes to the conclusion that it is relatively widely believed that AI will transform the world in our lifetimes.

We probably shouldn't rely too strongly on these opinions as predicting the future is hard. But, due to the enormous damage a misaligned AGI could do, it's worth putting a great deal of effort towards AI alignment even if you just care about currently existing humans (such as yourself).

Is large-scale automated AI persuasion and propaganda a serious concern?

Show your endorsement of this answer by giving it a stamp of approval!

Language models can be utilized to produce propaganda by acting like bots and interacting with users on social media. This can be done to push a political agenda or to make fringe views appear more popular than they are.

I'm envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusions) and the target audience, and the system will give you the most convincing arguments for it. At that point people won't be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.

-- Wei Dei, quoted in Persuasion Tools: AI takeover without AGI or agency?

As of 2022, this is not within the reach of current models. However, on the current trajectory, AI might be able to write articles and produce other media for propagandistic purposes that are superior to human-made ones in not too many years. These could be precisely tailored to individuals, using things like social media feeds and personal digital data.

Additionally, recommender systems on content platforms like YouTube, Twitter, and Facebook use machine learning, and the content they recommend can influence the opinions of billions of people. Some research has looked at the tendency for platforms to promote extremist political views and to thereby help radicalize their userbase for example.

In the long term, misaligned AI might use its persuasion abilities to gain influence and take control over the future. This could look like convincing its operators to let it out of a box, to give it resources or creating political chaos in order to disable mechanisms to prevent takeover as in this story.

See Risks from AI persuasion for a deep dive into the distinct risks from AI persuasion.

Might an aligned superintelligence force people to have better lives and change more quickly than they want?

Show your endorsement of this answer by giving it a stamp of approval!

If the superintelligence is aligned, probably not, but depends on the AIs metaethics.

For example, is it ethical to change what people want if we expect them to endorse it in hindsight, e.g. curing a drug or gambling addict of their addiction or treating a patient against their will? There is currently no consensus among moral philosophers regarding in which conditions this is acceptable, if any. An AI that follows preference utilitarianism would refuse to do so but a hedonistic utilitarian might consider it.

In order to reduce the possibility of unrest, an aligned superintelligence might avoid implementing policies outside of the Overton window when it is possible.

What actions can I take in under five minutes to contribute to the cause of AI safety?

Show your endorsement of this answer by giving it a stamp of approval!

There are two different reasons you might be looking for a 5 minute contribution.

  1. You are only willing to spend five minutes total
  2. You want a simple call to action which will concretize your commitment. You are looking for a small action which can open the door for a larger action.

If you are looking to only spend five minutes total, you can:

  • Send an article with a friend, so that they can learn more. One possible choice is the 80000 hours career profile
  • Share a link on social media, you never know who may be interested
  • Donate to an organization working on AI risk

IF you are looking for a small action which will start things moving, you might consider:

  • Ordering a book (such as the alignment problem), and follow up by reading it
  • Signing up for a newsletter
  • Applying for career coaching by AISS.

What are "human values"?

Show your endorsement of this answer by giving it a stamp of approval!
Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

Human Values are the things we care about, and would want an aligned superintelligence to look after and support. It is suspected that true human values are highly complex, and could be extrapolated into a wide variety of forms.

What are "scaling laws" and how are they relevant to safety?

Show your endorsement of this answer by giving it a stamp of approval!

Scaling laws are observed trends on the performance of large machine learning models.

In the field of ML, better performance is usually achieved through better algorithms, better inputs, or using larger amounts of parameters, computing power, or data. Since the 2010s, advances in deep learning have shown experimentally that the easier and faster returns come from scaling, an observation that has been described by Richard Sutton as the bitter lesson.

While deep learning as a field has long struggled to scale models up while retaining learning capability (with such problems as catastrophic interference), more recent methods, especially the Transformer model architecture, were able to just work by feeding them more data, and as the meme goes, stacking more layers.

More surprisingly, performance (in terms of absolute likelihood loss, a standard measure) appeared to increase smoothly with compute, or dataset size, or parameter count. Which gave rise to scaling laws, the trend lines suggested by performance gains, from which returns on data/compute/time investment could be extrapolated.

A companion to this purely descriptive law (no strong theoretical explanation of the phenomenon has been found yet), is the scaling hypothesis, which Gwern Branwen describes:

The strong scaling hypothesis is that, once we find a scalable architecture like self-attention or convolutions, [...] we can simply train ever larger [neural networks] and ever more sophisticated behavior will emerge naturally as the easiest way to optimize for all the tasks & data.

The scaling laws, if the above hypothesis holds, become highly relevant to safety insofar capability gains become conceptually easier to achieve: no need for clever designs to solve a given task, just throw more processing at it and it will eventually yield. As Paul Christiano observes:

It now seems possible that we could build “prosaic” AGI, which can replicate human behavior but doesn’t involve qualitatively new ideas about “how intelligence works”.

While the scaling laws still hold experimentally at the time of this writing (July 2022), whether they'll continue up to safety-relevant capabilities is still an open problem.

What are plausible candidates for "pivotal acts"?

Show your endorsement of this answer by giving it a stamp of approval!

Pivotal acts are acts that substantially change the direction humanity will have taken in 1 billion years. The term is used to denote positive changes, as opposed to existential catastrophe.

An obvious pivotal act would be to create a sovereign AGI aligned with humanity's best interests. An act that would greatly increase the chance of another pivotal act would also count as pivotal.

Pivotal acts often lay outside the Overton window. One such example is stopping or strongly delaying the development of an unaligned (or any) AGI through drastic means such as nanobots which melt all advanced processors, or the disabling of all AI researchers. Eliezer mentions these in AGI Ruin: A List of Lethalities. Andrew Critch argues against such an unilateral pivotal act in “Pivotal Act” Intentions: Negative Consequences and Fallacious Arguments.

For more details, see arbital.

What are some AI alignment research agendas currently being pursued?

Show your endorsement of this answer by giving it a stamp of approval!

Research at the Alignment Research Center is led by Paul Christiano, best known for introducing the “Iterated Distillation and Amplification” and “Humans Consulting HCH” approaches. He and his team are now “trying to figure out how to train ML systems to answer questions by straightforwardly ‘translating’ their beliefs into natural language rather than by reasoning about what a human wants to hear.”

Chris Olah (after work at DeepMind and OpenAI) recently launched Anthropic, an AI lab focussed on the safety of large models. While his previous work was concerned with “transparency” and “interpretability” of large neural networks, especially vision models, Anthropic is focussing more on large language models, among other things working towards a "general-purpose, text-based assistant that is aligned with human values, meaning that it is helpful, honest, and harmless".

Stuart Russell and his team at the Center for Human-Compatible Artificial Intelligence (CHAI) have been working on inverse reinforcement learning (where the AI infers human values from observing human behavior) and corrigibility, as well as attempts to disaggregate neural networks into “meaningful” subcomponents (see Filan, et al.’s “Clusterability in neural networks” and Hod et al.'s “Detecting modularity in deep neural networks”).

Alongside the more abstract “agent foundations” work they have become known for, MIRI recently announced their “Visible Thoughts Project” to test the hypothesis that “Language models can be made more understandable (and perhaps also more capable, though this is not the goal) by training them to produce visible thoughts.”

OpenAI have recently been doing work on iteratively summarizing books (summarizing, and then summarizing the summary, etc.) as a method for scaling human oversight.

Stuart Armstrong’s recently launched AlignedAI are mainly working on concept extrapolation from familiar to novel contexts, something he believes is “necessary and almost sufficient” for AI alignment.

Redwood Research (Buck Shlegeris, et al.) are trying to “handicap' GPT-3 to only produce non-violent completions of text prompts. “The idea is that there are many reasons we might ultimately want to apply some oversight function to an AI model, like ‘don't be deceitful’, and if we want to get AI teams to apply this we need to be able to incorporate these oversight predicates into the original model in an efficient manner.”

Ought is an independent AI safety research organization led by Andreas Stuhlmüller and Jungwon Byun. They are researching methods for breaking up complex, hard-to-verify tasks into simpler, easier-to-verify tasks, with the aim of allowing us to maintain effective oversight over AIs.

What are some good podcasts about AI alignment?

Show your endorsement of this answer by giving it a stamp of approval!

All the content below is in English:

What are some objections to the importance of AI alignment?

Show your endorsement of this answer by giving it a stamp of approval!

Søren Elverlin has compiled a list of counter-arguments and suggests dividing them into two kinds: weak and strong.

Weak counter-arguments point to problems with the "standard" arguments (as given in, e.g., Bostrom’s Superintelligence), especially shaky models and assumptions that are too strong. These arguments are often of a substantial quality and are often presented by people who themselves worry about AI safety. Elverin calls these objections “weak” because they do not attempt to imply that the probability of a bad outcome is close to zero: “For example, even if you accept Paul Christiano's arguments against “fast takeoff”, they only drive the probability of this down to about 20%. Weak counter-arguments are interesting, but the decision to personally focus on AI safety doesn't strongly depend on the probability – anything above 5% is clearly a big enough deal that it doesn't make sense to work on other things.”

Strong arguments argue that the probability of existential catastrophe due to misaligned AI is tiny, usually by some combination of claiming that AGI is impossible or very far away. For example, Michael Littman has suggested that as (he believes) we’re so far from AGI, there will be a long period of human history wherein we’ll have ample time to grow up alongside powerful AIs and figure out how to align them.

Elverlin opines that “There are few arguments that are both high-quality and strong enough to qualify as an ‘objection to the importance of alignment’.” He suggests Rohin Shah's arguments for “alignment by default” as one of the better candidates.

MIRI's April fools "Death With Dignity" strategy might be seen as an argument against the importance of working on alignment, but only in the sense that we might have almost no hope of solving it. In the same category are the “something else will kill us first, so there’s no point worrying about AI alignment” arguments.

What are the different versions of decision theory?

Show your endorsement of this answer by giving it a stamp of approval!
The three main classes of decision theory are evidential decision theory, causal theory and logical decision theory.

Evidential decision theory (EDT) reasons with the conditional probability of events based on the evidence. An agent using EDT selects the action which has the best expected outcome based on the evidence available. It views its action as one more fact about the world, which it can reason about, but does not distinguish the causal effect of its actions from any other conditional factor. See What is "evidential decision theory" for further explanation.

Causal decision theory (CDT) reasons about the causal relationship between the decision and its physical consequences. An agent using CDT views its choice as affecting the specific action that it takes, and, by extension, everything which that action causes. It selects the action which will bring about the best expected outcome based on its knowledge at the time of the decision. See What is "causal decision theory" for further explanation.

Logical decision theory (LDT) is a class of decision theories, including updateless decision theory functional decision theory and timeless decision theory, which share in making use of logical counterfactuals. An agent using a LDT will act as if it controls the logical output of its own decision algorithm, and not just its immediate action. In general a LDT can outperform other forms of decision theory in problems that include:

A specific example of a LDT is Functional Decision Theory (FDT). This says that agents should treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?”. It does not calculate the best outcome based on its immediate circumstance, but rather views itself as an instance of a function which must be consistent under all instantiations that it finds itself in.

See What is "functional decision theory"? and What is "logical decision theory" for further explanation .

Further reading:

Decision Theory FAQ

comprehensive list of decision theories

What does Elon Musk think about AI safety?

Show your endorsement of this answer by giving it a stamp of approval!

Elon Musk has expressed his concerns about AI safety many times and founded OpenAI in an attempt to make safe AI more widely distributed (as opposed to allowing a singleton, which he fears would be misused or dangerously unaligned). In a YouTube video from November 2019 Musk stated that there's a lack of investment in AI safety and that there should be a government agency to reduce risk to the public from AI.

What is "HCH"?

Show your endorsement of this answer by giving it a stamp of approval!

Humans Consulting HCH (HCH) is a recursive acronym describing a setup where humans can consult simulations of themselves to help answer questions. It is a concept used in discussion of the iterated amplification proposal to solve the alignment problem.

It was first described by Paul Christiano in his post Humans Consulting HCH:

Consider a human Hugh who has access to a question-answering machine. Suppose the machine answers question Q by perfectly imitating how Hugh would answer question Q, if Hugh had access to the question-answering machine.

That is, Hugh is able to consult a copy of Hugh, who is able to consult a copy of Hugh, who is able to consult a copy of Hugh…

Let’s call this process HCH, for “Humans Consulting HCH.”

What is "evidential decision theory"?

Show your endorsement of this answer by giving it a stamp of approval!
Evidential Decision Theory – EDT – is a branch of decision theory which advises an agent to take actions which, conditional on it happening, maximizes the chances of the desired outcome. As any branch of decision theory, it prescribes taking the action that maximizes utility, that which utility equals or exceeds the utility of every other option. The utility of each action is measured by the expected utility, the averaged by probabilities sum of the utility of each of its possible results. How the actions can influence the probabilities differ between the branches. Causal Decision Theory – CDT – says only through causal process one can influence the chances of the desired outcome 1. EDT, on the other hand, requires no causal connection, the action only have to be a Bayesian evidence for the desired outcome. Some critics say it recommends auspiciousness over causal efficacy2.

Evidential Decision Theory – EDT – is a branch of decision theory which advises an agent to take actions which, conditional on it happening, maximizes the chances of the desired outcome. As any branch of decision theory, it prescribes taking the action that maximizes utility, that which utility equals or exceeds the utility of every other option. The utility of each action is measured by the expected utility, the averaged by probabilities sum of the utility of each of its possible results. How the actions can influence the probabilities differ between the branches. Causal Decision Theory – CDT – says only through causal process one can influence the chances of the desired outcome [#fn1 1]. EDT, on the other hand, requires no causal connection, the action only have to be a Bayesian evidence for the desired outcome. Some critics say it recommends auspiciousness over causal efficacy[#fn2 2].

One usual example where EDT and CDT are often said to diverge is the Smoking lesion: “Smoking is strongly correlated with lung cancer, but in the world of the Smoker's Lesion this correlation is understood to be the result of a common cause: a genetic lesion that tends to cause both smoking and cancer. Once we fix the presence or absence of the lesion, there is no additional correlation between smoking and cancer. Suppose you prefer smoking without cancer to not smoking without cancer, and prefer smoking with cancer to not smoking with cancer. Should you smoke?” CDT would recommend smoking since there is no causal connection between smoking and cancer. They are both caused by a gene, but have no causal direct connection with each other. Naive EDT, on the other hand, would recommend against smoking, since smoking is an evidence for having the mentioned gene and thus should be avoided. However, a more sophisticated agent following the recommendations of EDT would recognize that if they observe that they have the desire to smoke, then actually smoking or not would provide no more evidence for having cancer; that is, the "tickle" screens off smoking from cancer. (This is known as the tickle defence.)

CDT uses probabilities of conditionals and contrafactual dependence to calculate the expected utility of an action – which track causal relations -, whereas EDT simply uses conditional probabilities. The probability of a conditional is the probability of the whole conditional being true, where the conditional probability is the probability of the consequent given the antecedent. A conditional probability of B given A - P(B|A) -, simply implies the Bayesian probability of the event B happening given we known A happened, it’s used in EDT. The probability of conditionals – P(A > B) - refers to the probability that the conditional 'A implies B' is true, it is the probability of the contrafactual ‘If A, then B’ be the case. Since contrafactual analysis is the key tool used to speak about causality, probability of conditionals are said to mirror causal relations. In most usual cases these two probabilities are the same. However, David Lewis proved [#fn3 3] its’ impossible to probabilities of conditionals to always track conditional probabilities. Hence evidential relations aren’t the same as causal relations and CDT and EDT will diverge depending on the problem. In some cases, EDT gives a better answers then CDT, such as the Newcomb's problem, whereas in the Smoking lesion problem where CDT seems to give a more reasonable prescription (modulo the tickle defence).

References

  1. http://plato.stanford.edu/entries/decision-causal/[#fnref1 ↩]
  2. Joyce, J.M. (1999), The foundations of causal decision theory, p. 146[#fnref2 ↩]
  3. Lewis, D. (1976), "Probabilities of conditionals and conditional probabilities", The Philosophical Review (Duke University Press) 85 (3): 297–315[#fnref3 ↩]
  4. Caspar Oesterheld, "Understanding the Tickle Defense in Decision Theory"
  5. Ahmed, Arif. (2014), "Evidence, Decision and Causality" (Cambridge University Press)

Blog posts

See also

What is "functional decision theory"?

Show your endorsement of this answer by giving it a stamp of approval!
Functional Decision Theory is a decision theory described by Eliezer Yudkowsky and Nate Soares which says that agents should treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?”. It is a replacement of Timeless Decision Theory, and it outperforms other decision theories such as Causal Decision Theory (CDT) and Evidential Decision Theory (EDT). For example, it does better than CDT on Newcomb's Problem, better than EDT on the smoking lesion problem, and better than both in Parfit’s hitchhiker problem.

Functional Decision Theory is a decision theory described by Eliezer Yudkowsky and Nate Soares which says that agents should treat one’s decision as the output of a fixed mathematical function that answers the question, “Which output of this very function would yield the best outcome?”. It is a replacement of Timeless Decision Theory, and it outperforms other decision theories such as Causal Decision Theory (CDT) and Evidential Decision Theory (EDT). For example, it does better than CDT on Newcomb's Problem, better than EDT on the smoking lesion problem, and better than both in Parfit’s hitchhiker problem.

In Newcomb's Problem, an FDT agent reasons that Omega must have used some kind of model of her decision procedure in order to make an accurate prediction of her behavior. Omega's model and the agent are therefore both calculating the same function (the agent's decision procedure): they are subjunctively dependent on that function. Given perfect prediction by Omega, there are therefore only two outcomes in Newcomb's Problem: either the agent one-boxes and Omega predicted it (because its model also one-boxed), or the agent two-boxes and Omega predicted that. Because one-boxing then results in a million and two-boxing only in a thousand dollars, the FDT agent one-boxes.

External links:

See Also:

What is "hedonium"?

Show your endorsement of this answer by giving it a stamp of approval!
Orgasmium (also known as hedonium) is a homogeneous substance with limited consciousness, which is in a constant state of supreme bliss. An AI programmed to "maximize happiness" might simply tile the universe with orgasmium. Some who believe this consider it a good thing; others do not. Those who do not, use its undesirability to argue that not all terminal values reduce to "happiness" or some simple analogue. Hedonium is the hedonistic utilitarian's version of utilitronium.

Orgasmium (also known as hedonium) is a homogeneous substance with limited consciousness, which is in a constant state of supreme bliss. An AI programmed to "maximize happiness" might simply tile the universe with orgasmium. Some who believe this consider it a good thing; others do not. Those who do not, use its undesirability to argue that not all terminal values reduce to "happiness" or some simple analogue. Hedonium is the hedonistic utilitarian's version of utilitronium.

Blog posts

See also

What is an "agent"?

Show your endorsement of this answer by giving it a stamp of approval!
A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence.

A rational agent is an entity which has a utility function, forms beliefs about its environment, evaluates the consequences of possible actions, and then takes the action which maximizes its utility. They are also referred to as goal-seeking. The concept of a rational agent is used in economics, game theory, decision theory, and artificial intelligence.

Editor note: there is work to be done reconciling this page, Agency page, and Robust Agents. Currently they overlap and I'm not sure they're consistent. - Ruby, 2020-09-15

More generally, an agent is anything that can be viewed as perceiving its environment through sensors and acting upon that environment through actuators.[1]

There has been much discussion as to whether certain AGI designs can be made into mere tools or whether they will necessarily be agents which will attempt to actively carry out their goals. Any minds that actively engage in goal-directed behavior are potentially dangerous, due to considerations such as basic AI drives possibly causing behavior which is in conflict with humanity's values.

In Dreams of Friendliness and in Reply to Holden on Tool AI, Eliezer Yudkowsky argues that, since all intelligences select correct beliefs from the much larger space of incorrect beliefs, they are necessarily agents.

See also

Posts

  1. Russel, S. & Norvig, P. (2003) Artificial Intelligence: A Modern Approach. Second Edition. Page 32.

What is an "s-risk"?

Show your endorsement of this answer by giving it a stamp of approval!
(Astronomical) suffering risks, also known as s-risks, are risks of the creation of intense suffering in the far future on an astronomical scale, vastly exceeding all suffering that has existed on Earth so far.

(Astronomical) suffering risks, also known as s-risks, are risks of the creation of intense suffering in the far future on an astronomical scale, vastly exceeding all suffering that has existed on Earth so far.

S-risks are an example of existential risk (also known as x-risks) according to Nick Bostrom's original definition, as they threaten to "permanently and drastically curtail [Earth-originating intelligent life's] potential". Most existential risks are of the form "event E happens which drastically reduces the number of conscious experiences in the future". S-risks therefore serve as a useful reminder that some x-risks are scary because they cause bad experiences, and not just because they prevent good ones.

Within the space of x-risks, we can distinguish x-risks that are s-risks, x-risks involving human extinction, x-risks that involve immense suffering and human extinction, and x-risks that involve neither. For example:

<figure class="table"><tbody></tbody>
 extinction risknon-extinction risk
suffering riskMisaligned AGI wipes out humans, simulates many suffering alien civilizations.Misaligned AGI tiles the universe with experiences of severe suffering.
non-suffering riskMisaligned AGI wipes out humans.Misaligned AGI keeps humans as "pets," limiting growth but not causing immense suffering.
</figure>

A related concept is hyperexistential risk, the risk of "fates worse than death" on an astronomical scale. It is not clear whether all hyperexistential risks are s-risks per se. But arguably all s-risks are hyperexistential, since "tiling the universe with experiences of severe suffering" would likely be worse than death.

There are two EA organizations with s-risk prevention research as their primary focus: the Center on Long-Term Risk (CLR) and the Center for Reducing Suffering. Much of CLR's work is on suffering-focused AI safety and crucial considerations. Although to a much lesser extent, the Machine Intelligence Research Institute and Future of Humanity Institute have investigated strategies to prevent s-risks too. 

Another approach to reducing s-risk is to "expand the moral circle" together with raising concern for suffering, so that future (post)human civilizations and AI are less likely to instrumentally cause suffering to non-human minds such as animals or digital sentience. Sentience Institute works on this value-spreading problem.

 

See also

 

External links

What is causal decision theory?

Show your endorsement of this answer by giving it a stamp of approval!
Causal Decision Theory – CDT – is a branch of decision theory which advises an agent to take actions which maximize the causal consequences on the probability of desired outcomes 1. As any branch of decision theory, it prescribes taking the action that maximizes expected utility, i.e the action which maximizes the sum of the utility obtained in each outcome weighted by the probability of that outcome occurring, given your action. Different decision theories correspond to different ways of construing this dependence between actions and outcomes. CDT focuses on the causal relations between one’s actions and outcomes, whilst Evidential Decision Theory – EDT - concerns itself with what an action indicates about the world (which is operationalized by the conditional probability). That is, according to CDT, a rational agent should track the available causal relations linking his actions to the desired outcome and take the action which will better enhance the chances of the desired outcome.

Causal Decision Theory – CDT – is a branch of decision theory which advises an agent to take actions which maximize the causal consequences on the probability of desired outcomes [#fn1 1]. As any branch of decision theory, it prescribes taking the action that maximizes expected utility, i.e the action which maximizes the sum of the utility obtained in each outcome weighted by the probability of that outcome occurring, given your action. Different decision theories correspond to different ways of construing this dependence between actions and outcomes. CDT focuses on the causal relations between one’s actions and outcomes, whilst Evidential Decision Theory – EDT - concerns itself with what an action indicates about the world (which is operationalized by the conditional probability). That is, according to CDT, a rational agent should track the available causal relations linking his actions to the desired outcome and take the action which will better enhance the chances of the desired outcome.

One usual example where EDT and CDT commonly diverge is the Smoking lesion: “Smoking is strongly correlated with lung cancer, but in the world of the Smoker's Lesion this correlation is understood to be the result of a common cause: a genetic lesion that tends to cause both smoking and cancer. Once we fix the presence or absence of the lesion, there is no additional correlation between smoking and cancer. Suppose you prefer smoking without cancer to not smoking without cancer, and prefer smoking with cancer to not smoking with cancer. Should you smoke?” CDT would recommend smoking since there is no causal connection between smoking and cancer. They are both caused by a gene, but have no causal direct connection with each other. EDT, on the other hand, would recommend against smoking, since smoking is an evidence for having the mentioned gene and thus should be avoided.

The core aspect of CDT is mathematically represented by the fact it uses probabilities of conditionals in place of conditional probabilities [#fn2 2]. The probability of a conditional is the probability of the whole conditional being true, where the conditional probability is the probability of the consequent given the antecedent. A conditional probability of B given A - P(B|A) -, simply implies the Bayesian probability of the event B happening given we known A happened, it’s used in EDT. The probability of conditionals – P(A > B) - refers to the probability that the conditional 'A implies B' is true, it is the probability of the contrafactual ‘If A, then B’ be the case. Since contrafactual analysis is the key tool used to speak about causality, probability of conditionals are said to mirror causal relations. In most cases these two probabilities track each other, and CDT and EDT give the same answers. However, some particular problems have arisen where their predictions for rational action diverge such as the Smoking lesion problem – where CDT seems to give a more reasonable prescription – and Newcomb's problem – where CDT seems unreasonable. David Lewis proved [#fn3 3] it's impossible to probabilities of conditionals to always track conditional probabilities. Hence, evidential relations aren’t the same as causal relations and CDT and EDT will always diverge in some cases.

References

  1. http://plato.stanford.edu/entries/decision-causal/
  2. Lewis, David. (1981) "Causal Decision Theory," Australasian Journal of Philosophy 59 (1981): 5- 30.
  3. Lewis, D. (1976), "Probabilities of conditionals and conditional probabilities", The Philosophical Review (Duke University Press) 85 (3): 297–315

See also

What is the "long reflection"?

Show your endorsement of this answer by giving it a stamp of approval!
The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.

The long reflection is a hypothesized period of time during which humanity works out how best to realize its long-term potential.

Some effective altruists, including Toby Ord and William MacAskill, have argued that, if humanity succeeds in eliminating existential risk or reducing it to acceptable levels, it should not immediately embark on an ambitious and potentially irreversible project of arranging the universe's resources in accordance to its values, but ought instead to spend considerable time— "centuries (or more)";[1] "perhaps tens of thousands of years";[2] "thousands or millions of years";[3] "[p]erhaps... a million years"[4]—figuring out what is in fact of value. The long reflection may thus be seen as an intermediate stage in a rational long-term human developmental trajectory, following an initial stage of existential security when existential risk is drastically reduced and followed by a final stage when humanity's potential is fully realized.[5]

Criticism

The idea of a long reflection has been criticized on the grounds that virtually eliminating all existential risk will almost certainly require taking a variety of large-scale, irreversible decisions—related to space colonization, global governance, cognitive enhancement, and so on—which are precisely the decisions meant to be discussed during the long reflection.[6][7] Since there are pervasive and inescapable tradeoffs between reducing existential risk and retaining moral option value, it may be argued that it does not make sense to frame humanity's long-term strategic picture as one consisting of two distinct stages, with one taking precedence over the other.

Further reading

Aird, Michael (2020) Collection of sources that are highly relevant to the idea of the Long Reflection, Effective Altruism Forum, June 20.
Many additional resources on this topic.

Wiblin, Robert & Keiran Harris (2018) Our descendants will probably see us as moral monsters. what should we do about that?, 80,000 Hours, January 19.
Interview with William MacAskill about the long reflection and other topics.

Related entries

dystopia | existential risk | existential security | long-term future | longtermism | longtermist institutional reform | moral uncertainty | normative ethics | value lock-in

  1. Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.

  2. Greaves, Hilary et al. (2019) A research agenda for the Global Priorities Institute, Oxford.

  3. Dai, Wei (2019) The argument from philosophical difficulty, LessWrong, February 9.

  4. William MacAskill, in Perry, Lucas (2018) AI alignment podcast: moral uncertainty and the path to AI alignment with William MacAskill, AI Alignment podcast, September 17.

  5. Ord, Toby (2020) The Precipice: Existential Risk and the Future of Humanity, London: Bloomsbury Publishing.

  6. Stocker, Felix (2020) Reflecting on the long reflection, Felix Stocker’s Blog, August 14.

  7. Hanson, Robin (2021) ‘Long reflection’ is crazy bad idea, Overcoming Bias, October 20.

What is the "windfall clause"?

Show your endorsement of this answer by giving it a stamp of approval!

The windfall clause is pretty well explained on the Future of Humanity Institute site.

Here's a quick summary:
It is an agreement between AI firms to donate significant amounts of any profits made as a consequence of economically transformative breakthroughs in AI capabilities. The donations are intended to help benefit humanity.

What safety problems are associated with whole brain emulation?

Show your endorsement of this answer by giving it a stamp of approval!

It seems improbable that whole brain emulation (WBE) arrives before neuromorphic AI because a better understanding of the brain would probably help with the development of latter.

Even if WBE were to arrive first, there is some debate on whether it would be safer than synthetic AI. An accelerated WBE might be a safe template for an AGI as it would directly inherit the subject's way of thinking but some safety problems could still arise.

  • We don't know how human psychology would react to being so far off distribution. As an intuition pump, very high IQ individuals are at higher risk for psychological disorders.
  • A superintelligent WBE would get a large amount of power, which historically has tended to corrupt humans.
  • High speed might make interactions with normal-speed humans difficult, as explored in Robin Hanson's The Age of Em.
  • It is unclear whether WBE would be dynamically more predictable than AI engineered by competent safety-conscious programmers.
  • Even if WBE arrives before AGI, Bostrom argues we should expect a second (potentially dangerous) transition to fully synthetic AGI due to their improved efficiency over WBE.

Nonetheless, Yudkowsky believes that emulations are probably better even if they are unlikely.

What subjects should I study at university to prepare myself for alignment research?

Show your endorsement of this answer by giving it a stamp of approval!

To prepare for AI alignment research, it is important to understand machine learning, and to have a solid grasp of the relevant mathematics such as linear algebra, calculus, and statistics. A degree in mathematics, computer science, or directly in AI, is a good way to build this understanding. However, AI alignment also benefits from having researchers with diverse backgrounds, so if you have a particular interest or talent in a different topic, it can be valuable to pursue a degree in that topic instead. For example, degrees that could be relevant are neuroscience, philosophy, physics, biology, cybersecurity, risk management, safety engineering, or economics. It has been argued in particular that AI safety needs social scientists.

If you are uncertain, you can apply for coaching from the the career advice platform 80000 hours, which also has a career review of technical AI safety research. Another option is to apply for coaching from AI safety support.

When choosing university courses, try to cover

  • machine learning
    • firm grasp of the basics
    • deep learning, in particular transformers
    • reinforcement learning
  • statistics
  • linear algebra
  • calculus
  • game theory

In addition to taking relevant university courses, it is also really helpful to study AI safety materials outside of university. An excellent place to start is the AGI safety fundamentals course.

What would a good future with AGI look like?

Show your endorsement of this answer by giving it a stamp of approval!

As technology continues to improve, one thing is certain: the future is going to look like science fiction. Doubly so once superhuman AI ("AGI") is invented, because we can expect the AGI to produce technological improvements at a superhuman rate, eventually approaching the physical limits in terms of how small machines can be miniaturized, how fast they can compute, how energy-efficient they can be, etc.

Today's world is lacking in many ways, so given these increasingly powerful tools, it seems likely that whoever controls those tools will use them to make increasingly large (and increasingly sci-fi-sounding) improvements to the world. If (and that's a big if!) humanity retains control of the AGI, we could use these amazing technologies to stop climate change, colonize other planets, solve world hunger, cure cancer and every other disease, even eliminate aging and death.

For more inspiration, here are some stories painting what a bright, AGI-powered future could look like:

Will an aligned superintelligence care about animals other than humans?

Show your endorsement of this answer by giving it a stamp of approval!

An aligned superintelligence will have a set of human values. As mentioned in What are "human values"? the set of values are complex, which means that the implementation of these values will decide whether the superintelligence cares about nonhuman animals. In AI Ethics and Value Alignment for Nonhuman Animals Soenke Ziesche argues that the alignment should include the values of nonhuman animals.

Would AI alignment be hard with deep learning?

Show your endorsement of this answer by giving it a stamp of approval!

Ajeya Cotra has written an excellent article named Why AI alignment could be hard with modern deep learning on this question.

Would donating small amounts to AI safety organizations make any significant difference?

Show your endorsement of this answer by giving it a stamp of approval!

Many parts of the AI alignment ecosystem are already well-funded, but a savvy donor can still make a difference by picking up grantmaking opportunities which are too small to catch the attention of the major funding bodies or are based on personal knowledge of the recipient.

One way to leverage a small amount of money to the potential of a large amount is to enter a donor lottery, where you donate to win a chance to direct a much larger amount of money (with probability proportional to donation size). This means that the person directing the money will be allocating enough that it's worth their time to do more in-depth research.

For an overview of the work the major organizations are doing, see the 2021 AI Alignment Literature Review and Charity Comparison. The Long-Term Future Fund seems to be an outstanding place to donate based on that, as they are the organization which most other organizations are most excited to see funded.

Wouldn't it be a good thing for humanity to die out?

Show your endorsement of this answer by giving it a stamp of approval!

In the words of Nate Soares:

I don’t expect humanity to survive much longer.

Often, when someone learns this, they say:
"Eh, I think that would be all right."

So allow me to make this very clear: it would not be "all right."

Imagine a little girl running into the road to save her pet dog. Imagine she succeeds, only to be hit by a car herself. Imagine she lives only long enough to die in pain.

Though you may imagine this thing, you cannot feel the full tragedy. You can’t comprehend the rich inner life of that child. You can’t understand her potential; your mind is not itself large enough to contain the sadness of an entire life cut short.

You can only catch a glimpse of what is lost—
—when one single human being dies.

Now tell me again how it would be "all right" if every single person were to die at once.

Many people, when they picture the end of humankind, pattern match the idea to some romantic tragedy, where humans, with all their hate and all their avarice, had been unworthy of the stars since the very beginning, and deserved their fate. A sad but poignant ending to our tale.

And indeed, there are many parts of human nature that I hope we leave behind before we venture to the heavens. But in our nature is also everything worth bringing with us. Beauty and curiosity and love, a capacity for fun and growth and joy: these are our birthright, ours to bring into the barren night above.

Calamities seem more salient when unpacked. It is far harder to kill a hundred people in their sleep, with a knife, than it is to order a nuclear bomb dropped on Hiroshima. Your brain can’t multiply, you see: it can only look at a hypothetical image of a broken city and decide it’s not that bad. It can only conjure an image of a barren planet and say "eh, we had it coming."

But if you unpack the scenario, if you try to comprehend all the lives snuffed out, all the children killed, the final spark of human joy and curiosity extinguished, all our potential squandered…

I promise you that the extermination of humankind would be horrific.