Nico Hill2
Questions Asked: | 134 |
Answers written: | 3 |
Questions by Nico Hill2 which have been answered
How difficult should we expect alignment to be?
Here we ask about the additional cost of building an aligned powerful system, compare to its unaligned version. We often assume it to be nonzero, in the same way it's easier and cheaper to build an elevator without emergency brakes. This is referred as the alignment tax, and most AI alignment research is geared toward reducing it.
One operational guess by Eliezer Yudkowsky about its magnitude is "[an aligned project will take] at least 50% longer serial time to complete than [its unaligned version], or two years longer, whichever is less". This holds for agents with enough capability that their behavior is qualitatively different from a safety engineering perspective (for instance, an agent that is not corrigible by default).
An essay by John Wentworth argues for a small chance of alignment happening "by default", with an alignment tax of effectively zero.
How do I form my own views about AI safety?
As with most things, the best way to form your views on AI safety is to read up on the various ideas and opinions that knowledgeable people in the field have, and to compare them and form your own perspective. There are several good places to start. One of them is the Machine Intelligence Research Institute`s "Why AI safety?" info page. The article contains links to relevant research. The Effective Altruism Forum has an article called "How I formed my own views on AI safety", which could also be pretty helpful. Here is a Robert Miles youtube video that can be a good place to start as well. Otherwise, there are various articles about it, like this one, from Vox.
There is significant controversy on how quickly AI will grow into a superintelligence. The Alignment Forum tag has many views on how things might unfold, where the probabilities of a soft (happening over years/decades) takeoff and a hard (happening in months, or less) takeoff are discussed.
OK, it’s great that you want to help, here are some ideas for ways you could do so without making a huge commitment:
- Learning more about AI alignment will provide you with good foundations for any path towards helping. You could start by absorbing content (e.g. books, videos, posts), and thinking about challenges or possible solutions.
- Getting involved with the movement by joining a local Effective Altruism or LessWrong group, Rob Miles’s Discord, and/or the AI Safety Slack is a great way to find friends who are interested and will help you stay motivated.
- Donating to organizations or individuals working on AI alignment, possibly via a donor lottery or the Long Term Future Fund, can be a great way to provide support.
- Writing or improving answers on my wiki so that other people can learn about AI alignment more easily is a great way to dip your toe into contributing. You can always ask on the Discord for feedback on things you write.
- Getting good at giving an AI alignment elevator pitch, and sharing it with people who may be valuable to have working on the problem can make a big difference. However you should avoid putting them off the topic by presenting it in a way which causes them to dismiss it as sci-fi (dos and don’ts in the elevator pitch follow-up question).
- Writing thoughtful comments on AI posts on LessWrong.
- Participating in the AGI Safety Fundamentals program – either the AI alignment or governance track – and then facilitating discussions for it in the following round. The program involves nine weeks of content, with about two hours of readings + exercises per week and 1.5 hours of discussion, followed by four weeks to work on an independent project. As a facilitator, you'll be helping others learn about AI safety in-depth, many of whom are considering a career in AI safety. In the early 2022 round, facilitators were offered a stipend, and this seems likely to be the case for future rounds as well! You can learn more about facilitating in this post from December 2021.
If I only care about helping people alive today, does AI safety still matter?
This largely depends on when you think AI will be advanced enough to constitute an immediate threat to humanity. This is difficult to estimate, but the field is surveyed at How long will it be until transformative AI is created?, which comes to the conclusion that it is relatively widely believed that AI will transform the world in our lifetimes.
We probably shouldn't rely too strongly on these opinions as predicting the future is hard. But, due to the enormous damage a misaligned AGI could do, it's worth putting a great deal of effort towards AI alignment even if you just care about currently existing humans (such as yourself).
Is large-scale automated AI persuasion and propaganda a serious concern?
Language models can be utilized to produce propaganda by acting like bots and interacting with users on social media. This can be done to push a political agenda or to make fringe views appear more popular than they are.
I'm envisioning that in the future there will also be systems where you can input any conclusion that you want to argue (including moral conclusions) and the target audience, and the system will give you the most convincing arguments for it. At that point people won't be able to participate in any online (or offline for that matter) discussions without risking their object-level values being hijacked.
-- Wei Dei, quoted in Persuasion Tools: AI takeover without AGI or agency?
As of 2022, this is not within the reach of current models. However, on the current trajectory, AI might be able to write articles and produce other media for propagandistic purposes that are superior to human-made ones in not too many years. These could be precisely tailored to individuals, using things like social media feeds and personal digital data.
Additionally, recommender systems on content platforms like YouTube, Twitter, and Facebook use machine learning, and the content they recommend can influence the opinions of billions of people. Some research has looked at the tendency for platforms to promote extremist political views and to thereby help radicalize their userbase for example.
In the long term, misaligned AI might use its persuasion abilities to gain influence and take control over the future. This could look like convincing its operators to let it out of a box, to give it resources or creating political chaos in order to disable mechanisms to prevent takeover as in this story.
See Risks from AI persuasion for a deep dive into the distinct risks from AI persuasion.
What are some good podcasts about AI alignment?
All the content below is in English:
- The AI technical safety section of the 80,000 Hours Podcast;
- The AI X-risk Research Podcast, hosted by Daniel Filan;
- The AI Alignment Podcast hosted by Lucas Perry from the Future of Life Institute (ran ~monthly from April 2018 to March 2021);
- The Alignment Newsletter Podcast by Rob Miles (an audio version of the weekly newsletter).
What does Elon Musk think about AI safety?
Elon Musk has expressed his concerns about AI safety many times and founded OpenAI in an attempt to make safe AI more widely distributed (as opposed to allowing a singleton, which he fears would be misused or dangerously unaligned). In a YouTube video from November 2019 Musk stated that there's a lack of investment in AI safety and that there should be a government agency to reduce risk to the public from AI.
What is "evidential decision theory"?
What is "functional decision theory"?
What is causal decision theory?
What is the "long reflection"?
What is the "windfall clause"?
The windfall clause is pretty well explained on the Future of Humanity Institute site.
Here's a quick summary:
It is an agreement between AI firms to donate significant amounts of any profits made as a consequence of economically transformative breakthroughs in AI capabilities. The donations are intended to help benefit humanity.
Will an aligned superintelligence care about animals other than humans?
An aligned superintelligence will have a set of human values. As mentioned in What are "human values"? the set of values are complex, which means that the implementation of these values will decide whether the superintelligence cares about nonhuman animals. In AI Ethics and Value Alignment for Nonhuman Animals Soenke Ziesche argues that the alignment should include the values of nonhuman animals.
Would AI alignment be hard with deep learning?
Ajeya Cotra has written an excellent article named Why AI alignment could be hard with modern deep learning on this question.
Would donating small amounts to AI safety organizations make any significant difference?
Many parts of the AI alignment ecosystem are already well-funded, but a savvy donor can still make a difference by picking up grantmaking opportunities which are too small to catch the attention of the major funding bodies or are based on personal knowledge of the recipient.
One way to leverage a small amount of money to the potential of a large amount is to enter a donor lottery, where you donate to win a chance to direct a much larger amount of money (with probability proportional to donation size). This means that the person directing the money will be allocating enough that it's worth their time to do more in-depth research.
For an overview of the work the major organizations are doing, see the 2021 AI Alignment Literature Review and Charity Comparison. The Long-Term Future Fund seems to be an outstanding place to donate based on that, as they are the organization which most other organizations are most excited to see funded.
Wouldn't it be a good thing for humanity to die out?
In the words of Nate Soares:
I don’t expect humanity to survive much longer.
Often, when someone learns this, they say:
"Eh, I think that would be all right."So allow me to make this very clear: it would not be "all right."
Imagine a little girl running into the road to save her pet dog. Imagine she succeeds, only to be hit by a car herself. Imagine she lives only long enough to die in pain.
Though you may imagine this thing, you cannot feel the full tragedy. You can’t comprehend the rich inner life of that child. You can’t understand her potential; your mind is not itself large enough to contain the sadness of an entire life cut short.
You can only catch a glimpse of what is lost—
—when one single human being dies.Now tell me again how it would be "all right" if every single person were to die at once.
Many people, when they picture the end of humankind, pattern match the idea to some romantic tragedy, where humans, with all their hate and all their avarice, had been unworthy of the stars since the very beginning, and deserved their fate. A sad but poignant ending to our tale.
And indeed, there are many parts of human nature that I hope we leave behind before we venture to the heavens. But in our nature is also everything worth bringing with us. Beauty and curiosity and love, a capacity for fun and growth and joy: these are our birthright, ours to bring into the barren night above.
Calamities seem more salient when unpacked. It is far harder to kill a hundred people in their sleep, with a knife, than it is to order a nuclear bomb dropped on Hiroshima. Your brain can’t multiply, you see: it can only look at a hypothetical image of a broken city and decide it’s not that bad. It can only conjure an image of a barren planet and say "eh, we had it coming."
But if you unpack the scenario, if you try to comprehend all the lives snuffed out, all the children killed, the final spark of human joy and curiosity extinguished, all our potential squandered…
I promise you that the extermination of humankind would be horrific.