Ajeya Cotra has written an excellent article named Why AI alignment could be hard with modern deep learning on this question.
If you like interactive FAQs, you've already found one! All joking aside, probably the best places to start as a newcomer are the The AI Revolution posts on WaitBuyWhy: The Road to Superintelligence and Our Immortality or Extinction for a fun accessible intro, or AGI safety from first principles for a more up to date option. If you prefer videos, Rob Miles's YouTube and MIRI's AI Alignment: Why It’s Hard, and Where to Start are great.
If you've up for a book-length introduction, there are several options.
The book which first made the case to the public is Nick Bostrom's Superintelligence. It gives an excellent overview of the state of the field in 2014 and makes a strong case for the subject being important as well as exploring many fascinating adjacent topics. However, it does not cover newer developments, such as mesa-optimizers or language models.
There's also Human Compatible by Stuart Russell, which gives a more up-to-date (2019) review of developments, with an emphasis on the approaches that the Center for Human Compatible AI are working on such as cooperative inverse reinforcement learning. There's a good review/summary on SlateStarCodex.
The Alignment Problem by Brian Christian is the most recent (2020) and has more of an emphasis on machine learning and current generation problems with AI than Superintelligence or Human Compatible.
Though not limited to AI Safety, Rationality: A-Z covers a lot of skills which are valuable to acquire for people trying to think about large and complex issues, with The Rationalist's Guide to the Galaxy available as a shorter and more AI focused accessible option.
If you're not already there, join the Discord where contributors hang out.
The main ways you can help are to answer questions or ask questions, or help to review answers or review questions. We're looking to cover everything in Stampy's scope. You could also join the dev team if you have programming skills.
One great thing you can do is write up a Google Doc with your top ~10 questions and post it to the Discord, or ask you friends to do the same (see follow-up question on collecting questions for a template message).
If you are a researcher or otherwise employed by an AI Safety focused organization, please contact us and we'll set you up with an account with extra privileges.
There are many plausible-sounding ways to align an AI, but so far none have been convincingly shown to be both implementable and reliably safe, despite a great deal of thought.
For implementability the key question is: How do we code this? Converting something to formal mathematics that can be understood by a computer program is much harder than just saying it in natural language, and proposed AI goal architectures are no exception. Complicated computer programs are usually the result of months of testing and debugging. But this one will be more complicated than any ever attempted before, and live tests are impossible: a superintelligence with a buggy goal system will display goal stability and try to prevent its programmers from discovering or changing the error.
Then, even if an idea sounds pretty good to us right now, it's hard to be at all confident it has no fatal flaws or loopholes. After all, many other proposals that originally sounded promising, like “just give commands to the AI” and “just tell the AI to figure out what makes us happy” end up, after more thought, to be dangerous.
Can we be sure that we’ve thought this through enough? Can we be sure that there isn’t some extremely subtle problem with it, so subtle that no human would ever notice it, but which might seem obvious to a superintelligence?
It certainly would be very unwise to purposefully create an artificial general intelligence now, before we have found a way to be certain it will act purely in our interests. But "general intelligence" is more of a description of a system's capabilities, and a vague one at that. We don't know what it takes to build such a system. This leads to the worrying possibility that our existing, narrow AI systems require only minor tweaks, or even just more computer power, to achieve general intelligence.
The pace of research in the field suggests that there's a lot of low-hanging fruit left to pick, after all, and the results of this research produce better, more effective AI in a landscape of strong competitive pressure to produce as highly competitive systems as we can. "Just" not building an AGI means ensuring that every organization in the world with lots of computer hardware doesn't build an AGI, either accidentally or mistakenly thinking they have a solution to the alignment problem, forever. It's simply far safer to also work on solving the alignment problem.