difficulty of alignment
Canonically answered
What is the general nature of the concern about AI alignment?
The basic concern as AI systems become increasingly powerful is that they won’t do what we want them to do – perhaps because they aren’t correctly designed, perhaps because they are deliberately subverted, or perhaps because they do what we tell them to do rather than what we really want them to do (like in the classic stories of genies and wishes.) Many AI systems are programmed to have goals and to attain them as effectively as possible – for example, a trading algorithm has the goal of maximizing profit. Unless carefully designed to act in ways consistent with human values, a highly sophisticated AI trading system might exploit means that even the most ruthless financier would disavow. These are systems that literally have a mind of their own, and maintaining alignment between human interests and their choices and actions will be crucial.
Would AI alignment be hard with deep learning?
Ajeya Cotra has written an excellent article named Why AI alignment could be hard with modern deep learning on this question.
How difficult should we expect alignment to be?
Here we ask about the additional cost of building an aligned powerful system, compare to its unaligned version. We often assume it to be nonzero, in the same way it's easier and cheaper to build an elevator without emergency brakes. This is referred as the alignment tax, and most AI alignment research is geared toward reducing it.
One operational guess by Eliezer Yudkowsky about its magnitude is "[an aligned project will take] at least 50% longer serial time to complete than [its unaligned version], or two years longer, whichever is less". This holds for agents with enough capability that their behavior is qualitatively different from a safety engineering perspective (for instance, an agent that is not corrigible by default).
An essay by John Wentworth argues for a small chance of alignment happening "by default", with an alignment tax of effectively zero.
What is artificial general intelligence safety / AI alignment?
AI alignment is a field that is focused on causing the goals of future superintelligent artificial systems to align with human values, meaning that they would behave in a way which was compatible with our survival and flourishing. This may be an extremely hard problem, especially with deep learning, and is likely to determine the outcome of the most important century. Alignment research is strongly interdisciplinary and can include computer science, mathematics, neuroscience, philosophy, and social sciences.
AGI safety is a related concept which strongly overlaps with AI alignment. AGI safety is concerned with making sure that building AGI systems doesn’t cause things to go badly wrong, and the main way in which things can go badly wrong is through misalignment. AGI safety includes policy work that prevents the building of dangerous AGI systems, or reduces misuse risks from AGI systems aligned to actors who don’t have humanity’s best interests in mind.
Why is AI alignment a hard problem?
What are the main sources of AI existential risk?
A comprehensive list of major contributing factors to AI being a threat to humanity's future is maintained on by Daniel Kokotajlo on the Alignment Forum.
Unanswered canonical questions
Is it hard like 'building a secure OS that works on the first try'? Hard like 'the engineering/logistics/implementation portion of the Manhattan Project'? Both? Some other option? Etc.