Question: Sets the destination title to move this page and attached answers to.
Alternate Phrasings: Alternate ways of phrasing this question to improve search.
Tags :
Academia Acausal trade Actors Agency Agent foundations Agi Agi fire alarm Ai Ai safety camp Ai safety support Ai takeoff Ai takeover Algorithmic progress Alignment Alignment proposals Alignment targets Anthropic Arc Architectures Arms race Asi Automation Autonomous weapons Awareness Benefits Biological cognitive enhancement Boxing Brain Brain-computer interfaces Capabilities Careers Center for human compatible ai (chai) Civilization Clr Cognitive enhancement Cognitive superpowers Coherent extrapolated volition Collaboration Communication Community Complexity of value Comprehensive ai services Compute Computing overhang Conjecture Consciousness Content Contributing Control problem Cooperative inverse reinforcement learning (cirl) Corrigibility Counterfactuals Creativity Cybersecurity Cyborgism Debate Deception Deceptive alignment Decision theory Deep learning Deepmind Definitions Developpers Differential technological development Difficulty of alignment Do what i mean Doom Education Eliezer yudkowsky Elk Elon musk Embedded agency Emotions Encultured Epistomology Eric drexler Ethics Eutopia Evolution Existential risk Far Formal proof Friendly ai Funding Future of humanity institute Game theory Goals Goodhart's law Governance Government Gpt Gpt-3 Hedonium Human values Human-in-the-loop Humans Impact measures Implementation Incentives Infohazard Information security Infra-bayesianism Infrastructure Inner alignment Inside view Institutions Instrumental convergence Intelligence Intelligence amplification Intelligence explosion Interpretability Investmenting Language models Literature Machine learning Maximizers Megaprojects Mentorship Mesa-optimization Metaethics Metaphilosophy Metaphors Miri Molecular nanotechnology Motivation Multipolar Myopia Narrow ai Natural language Nearest unblocked strategy Needs work Neural networks Neuromorphic ai Nick bostrom None Objections Open problem Openai Organizations Orthogonality thesis Other causes Ought Outdated Outer alignment Outreach Paperclip maximizer Paradigm Pattern-matching People Person-affecting view Personal action Persuasion Philosophy Physics Plausibility Plex's answer to what are some good resources on ai alignment? Policy Politics Power seeking Preferences Productivity Programming Psychology Quantilizers Race dynamics Ray kurzweil Recursive self-improvement Redwood research Regulation Research agendas Research assistants Resources Robin hanson Robots Robustness S-risk Scaling laws Secret Security mindset Seed ai Shard theory Simulation hypothesis Singleton Solutions Specification gaming Stable win condition Stampy Stop button Stub Study Success models Superintelligence Surveys Tech companies Technological unemployment Technology Test Test tag Timelines Tool ai Tooling Transformative ai Transhumanism Treacherous turn Tripwire Truthful ai Utility functions Value learning Values What about Whole brain emulation Why not just Wireheading academia tech companies
Related questions :
A lot of concern appears to focus on human-level or “superintelligent” AI. Is that a realistic prospect in the foreseeable future? AIs aren’t as smart as rats, let alone humans. Isn’t it far too early to be worrying about this kind of thing? Any AI will be a computer program. Why wouldn't it just do what it's programmed to do? Are AI researchers trying to make conscious AI? Are Google, OpenAI, etc. aware of the risk? Are any major politicians concerned about this? Are expert surveys on AI safety available? Are there any AI alignment projects which governments could usefully put a very large amount of resources into? Are there any plausibly workable proposals for regulating or banning dangerous AI research? Are there promising ways to make AI alignment researchers smarter? Are there risk analysis methods, which may help to make the risk more quantifiable or clear? Are there types of advanced AI that would be safer than others? Aren't robots the real problem? How can AI cause harm if it has no ability to directly manipulate the physical world? Aren’t there some pretty easy ways to eliminate these potential problems? At a high level, what is the challenge of alignment that we must meet to secure a good future? Can AI be creative? Can an AI really be smarter than humans? Can humans stay in control of the world if human- or superhuman-level AI is developed? Can people contribute to alignment by using proof assistants to generate formal proofs? Can we add "friendliness" to any artificial intelligence design? Can we constrain a goal-directed AI using specified rules? Can we ever be sure that an AI is aligned? Can we get AGI by scaling up architectures similar to current ones, or are we missing key insights? Can we program the superintelligence to maximize human pleasure or satisfaction of human desires? Can we teach a superintelligence a moral code with machine learning? Can we tell an AI just to figure out what we want and then do that? Can we test an AI to make sure that it’s not going to take over and do harmful things after it achieves superintelligence? Can you give an AI a goal which involves “minimally impacting the world”? Can you stop an advanced AI from upgrading itself? Can't we just tell an AI to do what we want? Can’t we just program the superintelligence not to harm us? Considering how hard it is to predict the future, why do we think we can say anything useful about AGI today? Could AI have basic emotions? Could I contribute by offering coaching to alignment researchers? If so, how would I go about this? Could an AGI have already been created and currently be affecting the world? Could divesting from AI companies without good safety culture be useful, or would this be likely to have a negligible impact? Could emulated minds do AI alignment research? Could we build provably beneficial AI systems? Could we get significant biological intelligence enhancements long before AGI? Could we program an AI to automatically shut down if it starts doing things we don’t want it to? Could we tell the AI to do what's morally right? Could weak AI systems help with alignment research? Do you need a PhD to work on AI Safety? Does it make sense to focus on scenarios where change is rapid and due to a single actor, or slower and dependent on getting agreements between several relevant actors? Does the importance of AI risk depend on caring about transhumanist utopias? Even if we are rationally convinced about the urgency of existential AI risk, it can be hard to feel that emotionally because the danger is so abstract. How can this gap be bridged? How can I be a more productive student/researcher? How can I collect questions for Stampy? How can I contact the Stampy team? How can I contribute in the area of community building? How can I contribute to Stampy? How can I convince others and present the arguments well? How can I get hired by an organization working on AI alignment? How can I join the Stampy dev team? How can I support alignment researchers to be more productive? How can we interpret what all the neurons mean? How close do AI experts think we are to creating superintelligence? How could an intelligence explosion be useful? How could general intelligence be programmed into a machine? How could poorly defined goals lead to such negative outcomes? How difficult should we expect alignment to be? How do I add content from LessWrong / Effective Altruism Forum tag-wikis to Stampy? How do I form my own views about AI safety? How do I format answers on Stampy? How do I know whether I'm a good fit for work on AI safety? How do I stay motivated and productive? How do I stay updated about AI progress? How do organizations do adversarial training and red teaming? How do the incentives in markets increase AI risk? How does AI taking things literally contribute to alignment being hard? How does MIRI communicate their view on alignment? How does the current global microchip supply chain work, and who has political power over it? How does the stamp eigenkarma system work? How doomed is humanity? How fast will AI takeoff be? How good is the world model of GPT-3? How hard is it for an AGI to develop powerful nanotechnology? How important is research closure and OPSEC for capabilities-synergistic ideas? How is "intelligence" defined? How is AGI different from current AI? How is Beth Barnes evaluating LM power seeking? How is OpenAI planning to solve the full alignment problem? How is metaethics relevant to AI alignment? How is the Alignment Research Center (ARC) trying to solve Eliciting Latent Knowledge (ELK)? How likely are AI organizations to respond appropriately to the risks of their creations? How likely is an "intelligence explosion"? How likely is it that AGI will first be developed by a large established organization, rather than a small startup, an academic group or a government? How likely is it that an AI would pretend to be a human to further its goals? How likely is it that governments will play a significant role? What role would be desirable, if any? How long will it be until superintelligent AI is created? How might "acausal trade" affect alignment? How might AGI interface with cybersecurity? How might AGI kill people? How might a real-world AI system that receives orders in natural language and does what you mean look? How might a superintelligence socially manipulate humans? How might a superintelligence technologically manipulate humans? How might an "intelligence explosion" be dangerous? How might an AI achieve a seemingly beneficial goal via inappropriate means? How might non-agentic GPT-style AI cause an "intelligence explosion" or otherwise contribute to existential risk? How might things go wrong with AI even without an agentic superintelligence? How might we get from Artificial General Intelligence to a Superintelligent system? How might we reduce the chance of an AI arms race? How might we reduce the diffusion of dangerous AI technology to insufficiently careful actors? How much can we learn about AI with interpretability tools? How much resources did the processes of biological evolution use to evolve intelligent creatures? How possible (and how desirable) is it to change which path humanity follows to get to AGI? How powerful will a mature superintelligence be? How quickly could an AI go from the first indications of problems to an unrecoverable disaster? How quickly would the AI capabilities ecosystem adopt promising new advances in AI alignment? How should I change my financial investments in response to the possibility of transformative AI? How should I decide which quality level to attribute to a proposed question? How should I personally prepare for when transformative AI arrives? How software- and/or hardware-bottlenecked are we on AGI? How successfully have institutions managed risks from novel technology in the past? How tractable is it to get governments to play a good role (rather than a bad role) and/or to get them to play a role at all (rather than no role)? How would I know if AGI were imminent? How would we align an AGI whose learning algorithms / cognition look like human brains? How would we know if an AI were suffering? How would you explain the theory of Infra-Bayesianism? I want to help out AI alignment without necessarily making major life changes. What are some simple things I can do to contribute? I want to work on AI alignment. How can I get funding? I'm interested in working on AI safety. What should I do? If AGI comes from a new paradigm, how likely is it to arise late in the paradigm when it is already deployed at scale, versus early on when only a few people are exploring the idea? If AI takes over the world how could it create and maintain the infrastructure that humans currently provide? If I only care about helping people alive today, does AI safety still matter? If an AI became conscious, how would we ever know? If we solve alignment, are we sure of a good future? In "aligning AI with human values", which humans' values are we talking about? In what ways are real-world machine learning systems different from expected utility maximizers? Is AI alignment possible? Is AI safety research racing against capability research? If so, how can safety research get ahead? Is expecting large returns from AI self-improvement just following an exponential trend line off a cliff? Is it already too late to work on AI alignment? Is it likely that hardware will allow an exponential takeoff? Is it possible to block an AI from doing certain things on the Internet? Is it possible to code into an AI to avoid all the ways a given task could go wrong, and would it be dangerous to try that? Is large-scale automated AI persuasion and propaganda a serious concern? Is merging with AI through brain-computer interfaces a potential solution to safety problems? Is the UN concerned about existential risk from AI? Is the focus on the existential threat of superintelligent AI diverting too much attention from more pressing debates about AI in surveillance and the battlefield, and its potential effects on the economy? Is the question of whether we're living in a simulation relevant to AI safety? If so, how? Is there a Chinese AI safety community? Are there safety researchers working at leading Chinese AI labs? Is there a danger in anthropomorphizing AI’s and trying to understand them in human terms? Is there something useful we can ask governments to do for AI alignment? Is this about AI systems becoming malevolent or conscious and turning on us? Isn't it hard to make a significant difference as a person who isn't going to be a world-class researcher? Isn't it too soon to be working on AGI safety? Isn't the real concern AI being misused by terrorists or other bad actors? Isn't the real concern AI-enabled totalitarianism? Isn't the real concern autonomous weapons? Isn't the real concern technological unemployment? Isn’t AI just a tool like any other? Won’t it just do what we tell it to? Isn’t it immoral to control and impose our values on AI? I’d like to get deeper into the AI alignment literature. Where should I look? Might an "intelligence explosion" never occur? Might an aligned superintelligence force people to "upload" themselves, so as to more efficiently use the matter of their bodies? Might an aligned superintelligence force people to have better lives and change more quickly than they want? Might an aligned superintelligence immediately kill everyone and then go on to create a "hedonium shockwave"? Might attempting to align AI cause a "near miss" which results in a much worse outcome? Might humanity create astronomical amounts of suffering when colonizing the universe after creating an aligned superintelligence? Might trying to build a hedonium-maximizing AI be easier and more likely to work than trying for eudaimonia? OK, I’m convinced. How can I help? Once we notice that a superintelligence given a specific task is trying to take over the world, can’t we turn it off, reprogram it or otherwise correct the problem? Should I engage in political or collective action like signing petitions or sending letters to politicians? Should we expect "warning shots" before an unrecoverable catastrophe? Superintelligence sounds like science fiction. Do people think about this in the real world? This all seems rather abstract. Isn't promoting love, wisdom, altruism or rationality more important? To what extent are there meaningfully different paths to AGI, versus just one path? We already have psychopaths who are "misaligned" with the rest of humanity, but somehow we deal with them. Can't we do something similar with AI? We’re going to merge with the machines so this will never be a problem, right? What about AI concerns other than misalignment? What about having a human supervisor who must approve all the AI's decisions before executing them? What actions can I take in under five minutes to contribute to the cause of AI safety? What alignment strategies are scalably safe and competitive? What approaches are AI alignment organizations working on? What are "coherence theorems" and what do they tell us about AI? What are "human values"? What are "scaling laws" and how are they relevant to safety? What are "selection theorems" and can they tell us anything useful about the likely shape of AGI systems? What are Encultured working on? What are OpenAI Codex and GitHub Copilot? What are Scott Garrabrant and Abram Demski working on? What are alternate phrasings for? What are brain-computer interfaces? What are language models? What are likely to be the first transformative applications of AI? What are mesa-optimizers? What are plausible candidates for "pivotal acts"? What are some AI alignment research agendas currently being pursued? What are some good books about AGI safety? What are some good podcasts about AI alignment? What are some good resources on AI alignment? What are some helpful AI policy ideas? What are some important examples of specialised terminology in AI alignment? What are some objections to the importance of AI alignment? What are some of the leading AI capabilities organizations? What are some of the most impressive recent advances in AI capabilities? What are some open research questions in AI alignment? What are some practice or entry-level problems for getting into alignment research? What are some problems in philosophy that are related to AI safety? What are some specific open tasks on Stampy? What are the "win conditions"/problems that need to be solved? What are the differences between AGI, transformative AI and superintelligence? What are the differences between “AI safety”, “AGI safety”, “AI alignment” and “AI existential safety”? What are the different possible AI takeoff speeds? What are the different versions of decision theory? What are the editorial protocols for Stampy questions and answers? What are the leading theories in moral philosophy and which of them might be technically the easiest to encode into an AI? What are the potential benefits of AI as it grows increasingly sophisticated? What are the style guidelines for writing for Stampy? What assets need to be protected by/from the AI? Are "human values" sufficient for it? What beneficial things would an aligned superintelligence be able to do? What can I do to contribute to AI safety? What can we expect the motivations of a superintelligent machine to be? What convinced people working on AI alignment that it was worth spending their time on this cause? What could a superintelligent AI do, and what would be physically impossible even for it? What does Elon Musk think about AI safety? What does Evan Hubinger think of Deception + Inner Alignment? What does MIRI think about technical alignment? What does Ought aim to do? What does a typical work day in the life of an AI safety researcher look like? What does alignment failure look like? What does generative visualization look like in reinforcement learning? What does the scheme Externalized Reasoning Oversight involve? What evidence do experts usually base their timeline predictions on? What external content would be useful to the Stampy project? What harm could a single superintelligence do when it took so many humans to build civilization? What if technological progress stagnates and we never achieve AGI? What if we put the AI in a box and have a second, more powerful, AI with the goal of preventing the first one from escaping? What is "Do What I Mean"? What is "HCH"? What is "agent foundations"? What is "biological cognitive enhancement"? What is "coherent extrapolated volition"? What is "evidential decision theory"? What is "friendly AI"? What is "functional decision theory"? What is "greater-than-human intelligence"? What is "hedonium"? What is "logical decision theory"? What is "metaphilosophy" and how does it relate to AI safety? What is "narrow AI"? What is "superintelligence"? What is "transformative AI"? What is "whole brain emulation"? What is AI Safety via Debate? What is Aligned AI / Stuart Armstrong working on? What is Anthropic's approach to LLM alignment? What is Artificial General Intelligence and what will it look like? What is Conjecture's Scalable LLM Interpretability research adgenda? What is Conjecture's epistemology research agenda? What is Conjecture, and what is their team working on? What is David Krueger working on? What is Dylan Hadfield-Menell's thesis on? What is GPT-3? What is Goodhart's law? What is John Wentworth's plan? What is MIRI’s mission? What is Stampy's copyright? What is a "pivotal act"? What is a "quantilizer"? What is a "value handshake"? What is a canonical question on Stampy's Wiki? What is a duplicate question on Stampy's Wiki? What is a follow-up question on Stampy's Wiki? What is a verified account on Stampy's Wiki? What is an "agent"? What is an "intelligence explosion"? What is an "s-risk"? What is artificial general intelligence safety / AI alignment? What is causal decision theory? What is interpretability and what approaches are there? What is meant by "AI takeoff"? What is neural network modularity? What is the "control problem"? What is the "long reflection"? What is the "orthogonality thesis"? What is the "universal prior"? What is the "windfall clause"? What is the Center for Human Compatible AI (CHAI)? What is the Center on Long-Term Risk (CLR) focused on? What is the Stampy project? What is the difference between inner and outer alignment? What is the general nature of the concern about AI alignment? What is the probability of extinction from misaligned superintelligence? What kind of a challenge is solving AI alignment? What kind of questions do we want on Stampy? What milestones are there between us and AGI? What plausibly happens five years before and after AGI? What research is being done to align modern deep learning systems? What safety problems are associated with whole brain emulation? What should I read to learn about decision theory? What should be marked as a "related" question on Stampy's Wiki? What should be marked as a canonical answer on Stampy's Wiki? What should the first AGI systems be aligned to do? What sources of information can Stampy use? What subjects should I study at university to prepare myself for alignment research? What technical problems are MIRI working on? What technological developments could speed up AI progress? What would a "warning shot" look like? What would a good future with AGI look like? What would a good solution to AI alignment look like? What would a world shortly before AGI look like? What would be physically possible and desirable to have in an AI-built utopia? What's especially worrisome about autonomous weapons? What's meant by calling an AI "agenty" or "agentlike"? What’s a good AI alignment elevator pitch? When should I stamp an answer? When will an intelligence explosion happen? When will transformative AI be created? Where can I find all the features of Stampy's Wiki? Where can I find mentorship and advice for becoming a researcher? Where can I find people to talk to about AI alignment? Where can I find questions to answer for Stampy? Where can I learn about AI alignment? Where can I learn about interpretability? Which country will AGI likely be created by, and does this matter? Which military applications of AI are likely to be developed? Which organizations are working on AI alignment? Which organizations are working on AI policy? Which university should I study at if I want to best prepare for working on AI alignment? Who created Stampy? Who is Nick Bostrom? Who is Stampy? Why can't we just make a "child AI" and raise it? Why can't we just turn the AI off if it starts to misbehave? Why can't we simply stop developing AI? Why can’t we just use Asimov’s Three Laws of Robotics? Why can’t we just use natural language instructions? Why can’t we just “put the AI in a box” so that it can’t influence the outside world? Why can’t we just… Why do some AI researchers not worry about alignment? Why do we expect that a superintelligence would closely approximate a utility maximizer? Why do you like stamps so much? Why does AI need goals in the first place? Can’t it be intelligent without any agenda? Why does AI takeoff speed matter? Why does there seem to have been an explosion of activity in AI in recent years? Why don't we just not build AGI if it's so dangerous? Why is AGI dangerous? Why is AGI safety a hard problem? Why is AI alignment a hard problem? Why is AI safety important? Why is safety important for smarter-than-human AI? Why is the future of AI suddenly in the news? What has changed? Why might a maximizing AI cause bad outcomes? Why might a superintelligent AI be dangerous? Why might an AI do something that we don’t want it to, if it’s really so intelligent? Why might contributing to Stampy be worth my time? Why might people try to build AGI rather than stronger and stronger narrow AIs? Why might we expect a fast takeoff? Why might we expect a moderate AI takeoff? Why might we expect a superintelligence to be hostile by default? Why should I worry about superintelligence? Why should we prepare for human-level AI technology now rather than decades down the line when it’s closer? Why think that AI can outperform humans? Why work on AI safety early? Why would great intelligence produce great power? Why would we only get one chance to align a superintelligence? Will AGI be agentic? Will AI learn to be independent from people or will it always ask for our orders? Will an aligned superintelligence care about animals other than humans? Will superintelligence make a large part of humanity unemployable? Will there be a discontinuity in AI capabilities? If so, at what stage? Will there be an AI-assisted "long reflection" and how might it look? Will we ever build a superintelligence? Won’t AI be just like us? Would "warning shots" make a difference and, if so, would they be helpful or harmful? Would AI alignment be hard with deep learning? Would an AI create or maintain suffering because some people want it? Would an aligned AI allow itself to be shut down? Would donating small amounts to AI safety organizations make any significant difference? Would it improve the safety of quantilizers to cut off the top few percent of the distribution? Wouldn't a superintelligence be smart enough not to make silly mistakes in its comprehension of our instructions? Wouldn't a superintelligence be smart enough to know right from wrong? Wouldn't it be a good thing for humanity to die out? Wouldn't it be safer to only build narrow AIs?
Question:
A lot of concern appears to focus on human-level or “superintelligent” AI. Is that a realistic prospect in the foreseeable future? AIs aren’t as smart as rats, let alone humans. Isn’t it far too early to be worrying about this kind of thing? Any AI will be a computer program. Why wouldn't it just do what it's programmed to do? Are AI researchers trying to make conscious AI? Are Google, OpenAI, etc. aware of the risk? Are any major politicians concerned about this? Are expert surveys on AI safety available? Are there any AI alignment projects which governments could usefully put a very large amount of resources into? Are there any plausibly workable proposals for regulating or banning dangerous AI research? Are there promising ways to make AI alignment researchers smarter? Are there risk analysis methods, which may help to make the risk more quantifiable or clear? Are there types of advanced AI that would be safer than others? Aren't robots the real problem? How can AI cause harm if it has no ability to directly manipulate the physical world? Aren’t there some pretty easy ways to eliminate these potential problems? At a high level, what is the challenge of alignment that we must meet to secure a good future? Can AI be creative? Can an AI really be smarter than humans? Can humans stay in control of the world if human- or superhuman-level AI is developed? Can people contribute to alignment by using proof assistants to generate formal proofs? Can we add "friendliness" to any artificial intelligence design? Can we constrain a goal-directed AI using specified rules? Can we ever be sure that an AI is aligned? Can we get AGI by scaling up architectures similar to current ones, or are we missing key insights? Can we program the superintelligence to maximize human pleasure or satisfaction of human desires? Can we teach a superintelligence a moral code with machine learning? Can we tell an AI just to figure out what we want and then do that? Can we test an AI to make sure that it’s not going to take over and do harmful things after it achieves superintelligence? Can you give an AI a goal which involves “minimally impacting the world”? Can you stop an advanced AI from upgrading itself? Can't we just tell an AI to do what we want? Can’t we just program the superintelligence not to harm us? Considering how hard it is to predict the future, why do we think we can say anything useful about AGI today? Could AI have basic emotions? Could I contribute by offering coaching to alignment researchers? If so, how would I go about this? Could an AGI have already been created and currently be affecting the world? Could divesting from AI companies without good safety culture be useful, or would this be likely to have a negligible impact? Could emulated minds do AI alignment research? Could we build provably beneficial AI systems? Could we get significant biological intelligence enhancements long before AGI? Could we program an AI to automatically shut down if it starts doing things we don’t want it to? Could we tell the AI to do what's morally right? Could weak AI systems help with alignment research? Do you need a PhD to work on AI Safety? Does it make sense to focus on scenarios where change is rapid and due to a single actor, or slower and dependent on getting agreements between several relevant actors? Does the importance of AI risk depend on caring about transhumanist utopias? Even if we are rationally convinced about the urgency of existential AI risk, it can be hard to feel that emotionally because the danger is so abstract. How can this gap be bridged? How can I be a more productive student/researcher? How can I collect questions for Stampy? How can I contact the Stampy team? How can I contribute in the area of community building? How can I contribute to Stampy? How can I convince others and present the arguments well? How can I get hired by an organization working on AI alignment? How can I join the Stampy dev team? How can I support alignment researchers to be more productive? How can we interpret what all the neurons mean? How close do AI experts think we are to creating superintelligence? How could an intelligence explosion be useful? How could general intelligence be programmed into a machine? How could poorly defined goals lead to such negative outcomes? How difficult should we expect alignment to be? How do I add content from LessWrong / Effective Altruism Forum tag-wikis to Stampy? How do I form my own views about AI safety? How do I format answers on Stampy? How do I know whether I'm a good fit for work on AI safety? How do I stay motivated and productive? How do I stay updated about AI progress? How do organizations do adversarial training and red teaming? How do the incentives in markets increase AI risk? How does AI taking things literally contribute to alignment being hard? How does MIRI communicate their view on alignment? How does the current global microchip supply chain work, and who has political power over it? How does the stamp eigenkarma system work? How doomed is humanity? How fast will AI takeoff be? How good is the world model of GPT-3? How hard is it for an AGI to develop powerful nanotechnology? How important is research closure and OPSEC for capabilities-synergistic ideas? How is "intelligence" defined? How is AGI different from current AI? How is Beth Barnes evaluating LM power seeking? How is OpenAI planning to solve the full alignment problem? How is metaethics relevant to AI alignment? How is the Alignment Research Center (ARC) trying to solve Eliciting Latent Knowledge (ELK)? How likely are AI organizations to respond appropriately to the risks of their creations? How likely is an "intelligence explosion"? How likely is it that AGI will first be developed by a large established organization, rather than a small startup, an academic group or a government? How likely is it that an AI would pretend to be a human to further its goals? How likely is it that governments will play a significant role? What role would be desirable, if any? How long will it be until superintelligent AI is created? How might "acausal trade" affect alignment? How might AGI interface with cybersecurity? How might AGI kill people? How might a real-world AI system that receives orders in natural language and does what you mean look? How might a superintelligence socially manipulate humans? How might a superintelligence technologically manipulate humans? How might an "intelligence explosion" be dangerous? How might an AI achieve a seemingly beneficial goal via inappropriate means? How might non-agentic GPT-style AI cause an "intelligence explosion" or otherwise contribute to existential risk? How might things go wrong with AI even without an agentic superintelligence? How might we get from Artificial General Intelligence to a Superintelligent system? How might we reduce the chance of an AI arms race? How might we reduce the diffusion of dangerous AI technology to insufficiently careful actors? How much can we learn about AI with interpretability tools? How much resources did the processes of biological evolution use to evolve intelligent creatures? How possible (and how desirable) is it to change which path humanity follows to get to AGI? How powerful will a mature superintelligence be? How quickly could an AI go from the first indications of problems to an unrecoverable disaster? How quickly would the AI capabilities ecosystem adopt promising new advances in AI alignment? How should I change my financial investments in response to the possibility of transformative AI? How should I decide which quality level to attribute to a proposed question? How should I personally prepare for when transformative AI arrives? How software- and/or hardware-bottlenecked are we on AGI? How successfully have institutions managed risks from novel technology in the past? How tractable is it to get governments to play a good role (rather than a bad role) and/or to get them to play a role at all (rather than no role)? How would I know if AGI were imminent? How would we align an AGI whose learning algorithms / cognition look like human brains? How would we know if an AI were suffering? How would you explain the theory of Infra-Bayesianism? I want to help out AI alignment without necessarily making major life changes. What are some simple things I can do to contribute? I want to work on AI alignment. How can I get funding? I'm interested in working on AI safety. What should I do? If AGI comes from a new paradigm, how likely is it to arise late in the paradigm when it is already deployed at scale, versus early on when only a few people are exploring the idea? If AI takes over the world how could it create and maintain the infrastructure that humans currently provide? If I only care about helping people alive today, does AI safety still matter? If an AI became conscious, how would we ever know? If we solve alignment, are we sure of a good future? In "aligning AI with human values", which humans' values are we talking about? In what ways are real-world machine learning systems different from expected utility maximizers? Is AI alignment possible? Is AI safety research racing against capability research? If so, how can safety research get ahead? Is expecting large returns from AI self-improvement just following an exponential trend line off a cliff? Is it already too late to work on AI alignment? Is it likely that hardware will allow an exponential takeoff? Is it possible to block an AI from doing certain things on the Internet? Is it possible to code into an AI to avoid all the ways a given task could go wrong, and would it be dangerous to try that? Is large-scale automated AI persuasion and propaganda a serious concern? Is merging with AI through brain-computer interfaces a potential solution to safety problems? Is the UN concerned about existential risk from AI? Is the focus on the existential threat of superintelligent AI diverting too much attention from more pressing debates about AI in surveillance and the battlefield, and its potential effects on the economy? Is the question of whether we're living in a simulation relevant to AI safety? If so, how? Is there a Chinese AI safety community? Are there safety researchers working at leading Chinese AI labs? Is there a danger in anthropomorphizing AI’s and trying to understand them in human terms? Is there something useful we can ask governments to do for AI alignment? Is this about AI systems becoming malevolent or conscious and turning on us? Isn't it hard to make a significant difference as a person who isn't going to be a world-class researcher? Isn't it too soon to be working on AGI safety? Isn't the real concern AI being misused by terrorists or other bad actors? Isn't the real concern AI-enabled totalitarianism? Isn't the real concern autonomous weapons? Isn't the real concern technological unemployment? Isn’t AI just a tool like any other? Won’t it just do what we tell it to? Isn’t it immoral to control and impose our values on AI? I’d like to get deeper into the AI alignment literature. Where should I look? Might an "intelligence explosion" never occur? Might an aligned superintelligence force people to "upload" themselves, so as to more efficiently use the matter of their bodies? Might an aligned superintelligence force people to have better lives and change more quickly than they want? Might an aligned superintelligence immediately kill everyone and then go on to create a "hedonium shockwave"? Might attempting to align AI cause a "near miss" which results in a much worse outcome? Might humanity create astronomical amounts of suffering when colonizing the universe after creating an aligned superintelligence? Might trying to build a hedonium-maximizing AI be easier and more likely to work than trying for eudaimonia? OK, I’m convinced. How can I help? Once we notice that a superintelligence given a specific task is trying to take over the world, can’t we turn it off, reprogram it or otherwise correct the problem? Should I engage in political or collective action like signing petitions or sending letters to politicians? Should we expect "warning shots" before an unrecoverable catastrophe? Superintelligence sounds like science fiction. Do people think about this in the real world? This all seems rather abstract. Isn't promoting love, wisdom, altruism or rationality more important? To what extent are there meaningfully different paths to AGI, versus just one path? We already have psychopaths who are "misaligned" with the rest of humanity, but somehow we deal with them. Can't we do something similar with AI? We’re going to merge with the machines so this will never be a problem, right? What about AI concerns other than misalignment? What about having a human supervisor who must approve all the AI's decisions before executing them? What actions can I take in under five minutes to contribute to the cause of AI safety? What alignment strategies are scalably safe and competitive? What approaches are AI alignment organizations working on? What are "coherence theorems" and what do they tell us about AI? What are "human values"? What are "scaling laws" and how are they relevant to safety? What are "selection theorems" and can they tell us anything useful about the likely shape of AGI systems? What are Encultured working on? What are OpenAI Codex and GitHub Copilot? What are Scott Garrabrant and Abram Demski working on? What are alternate phrasings for? What are brain-computer interfaces? What are language models? What are likely to be the first transformative applications of AI? What are mesa-optimizers? What are plausible candidates for "pivotal acts"? What are some AI alignment research agendas currently being pursued? What are some good books about AGI safety? What are some good podcasts about AI alignment? What are some good resources on AI alignment? What are some helpful AI policy ideas? What are some important examples of specialised terminology in AI alignment? What are some objections to the importance of AI alignment? What are some of the leading AI capabilities organizations? What are some of the most impressive recent advances in AI capabilities? What are some open research questions in AI alignment? What are some practice or entry-level problems for getting into alignment research? What are some problems in philosophy that are related to AI safety? What are some specific open tasks on Stampy? What are the "win conditions"/problems that need to be solved? What are the differences between AGI, transformative AI and superintelligence? What are the differences between “AI safety”, “AGI safety”, “AI alignment” and “AI existential safety”? What are the different possible AI takeoff speeds? What are the different versions of decision theory? What are the editorial protocols for Stampy questions and answers? What are the leading theories in moral philosophy and which of them might be technically the easiest to encode into an AI? What are the potential benefits of AI as it grows increasingly sophisticated? What are the style guidelines for writing for Stampy? What assets need to be protected by/from the AI? Are "human values" sufficient for it? What beneficial things would an aligned superintelligence be able to do? What can I do to contribute to AI safety? What can we expect the motivations of a superintelligent machine to be? What convinced people working on AI alignment that it was worth spending their time on this cause? What could a superintelligent AI do, and what would be physically impossible even for it? What does Elon Musk think about AI safety? What does Evan Hubinger think of Deception + Inner Alignment? What does MIRI think about technical alignment? What does Ought aim to do? What does a typical work day in the life of an AI safety researcher look like? What does alignment failure look like? What does generative visualization look like in reinforcement learning? What does the scheme Externalized Reasoning Oversight involve? What evidence do experts usually base their timeline predictions on? What external content would be useful to the Stampy project? What harm could a single superintelligence do when it took so many humans to build civilization? What if technological progress stagnates and we never achieve AGI? What if we put the AI in a box and have a second, more powerful, AI with the goal of preventing the first one from escaping? What is "Do What I Mean"? What is "HCH"? What is "agent foundations"? What is "biological cognitive enhancement"? What is "coherent extrapolated volition"? What is "evidential decision theory"? What is "friendly AI"? What is "functional decision theory"? What is "greater-than-human intelligence"? What is "hedonium"? What is "logical decision theory"? What is "metaphilosophy" and how does it relate to AI safety? What is "narrow AI"? What is "superintelligence"? What is "transformative AI"? What is "whole brain emulation"? What is AI Safety via Debate? What is Aligned AI / Stuart Armstrong working on? What is Anthropic's approach to LLM alignment? What is Artificial General Intelligence and what will it look like? What is Conjecture's Scalable LLM Interpretability research adgenda? What is Conjecture's epistemology research agenda? What is Conjecture, and what is their team working on? What is David Krueger working on? What is Dylan Hadfield-Menell's thesis on? What is GPT-3? What is Goodhart's law? What is John Wentworth's plan? What is MIRI’s mission? What is Stampy's copyright? What is a "pivotal act"? What is a "quantilizer"? What is a "value handshake"? What is a canonical question on Stampy's Wiki? What is a duplicate question on Stampy's Wiki? What is a follow-up question on Stampy's Wiki? What is a verified account on Stampy's Wiki? What is an "agent"? What is an "intelligence explosion"? What is an "s-risk"? What is artificial general intelligence safety / AI alignment? What is causal decision theory? What is interpretability and what approaches are there? What is meant by "AI takeoff"? What is neural network modularity? What is the "control problem"? What is the "long reflection"? What is the "orthogonality thesis"? What is the "universal prior"? What is the "windfall clause"? What is the Center for Human Compatible AI (CHAI)? What is the Center on Long-Term Risk (CLR) focused on? What is the Stampy project? What is the difference between inner and outer alignment? What is the general nature of the concern about AI alignment? What is the probability of extinction from misaligned superintelligence? What kind of a challenge is solving AI alignment? What kind of questions do we want on Stampy? What milestones are there between us and AGI? What plausibly happens five years before and after AGI? What research is being done to align modern deep learning systems? What safety problems are associated with whole brain emulation? What should I read to learn about decision theory? What should be marked as a "related" question on Stampy's Wiki? What should be marked as a canonical answer on Stampy's Wiki? What should the first AGI systems be aligned to do? What sources of information can Stampy use? What subjects should I study at university to prepare myself for alignment research? What technical problems are MIRI working on? What technological developments could speed up AI progress? What would a "warning shot" look like? What would a good future with AGI look like? What would a good solution to AI alignment look like? What would a world shortly before AGI look like? What would be physically possible and desirable to have in an AI-built utopia? What's especially worrisome about autonomous weapons? What's meant by calling an AI "agenty" or "agentlike"? What’s a good AI alignment elevator pitch? When should I stamp an answer? When will an intelligence explosion happen? When will transformative AI be created? Where can I find all the features of Stampy's Wiki? Where can I find mentorship and advice for becoming a researcher? Where can I find people to talk to about AI alignment? Where can I find questions to answer for Stampy? Where can I learn about AI alignment? Where can I learn about interpretability? Which country will AGI likely be created by, and does this matter? Which military applications of AI are likely to be developed? Which organizations are working on AI alignment? Which organizations are working on AI policy? Which university should I study at if I want to best prepare for working on AI alignment? Who created Stampy? Who is Nick Bostrom? Who is Stampy? Why can't we just make a "child AI" and raise it? Why can't we just turn the AI off if it starts to misbehave? Why can't we simply stop developing AI? Why can’t we just use Asimov’s Three Laws of Robotics? Why can’t we just use natural language instructions? Why can’t we just “put the AI in a box” so that it can’t influence the outside world? Why can’t we just… Why do some AI researchers not worry about alignment? Why do we expect that a superintelligence would closely approximate a utility maximizer? Why do you like stamps so much? Why does AI need goals in the first place? Can’t it be intelligent without any agenda? Why does AI takeoff speed matter? Why does there seem to have been an explosion of activity in AI in recent years? Why don't we just not build AGI if it's so dangerous? Why is AGI dangerous? Why is AGI safety a hard problem? Why is AI alignment a hard problem? Why is AI safety important? Why is safety important for smarter-than-human AI? Why is the future of AI suddenly in the news? What has changed? Why might a maximizing AI cause bad outcomes? Why might a superintelligent AI be dangerous? Why might an AI do something that we don’t want it to, if it’s really so intelligent? Why might contributing to Stampy be worth my time? Why might people try to build AGI rather than stronger and stronger narrow AIs? Why might we expect a fast takeoff? Why might we expect a moderate AI takeoff? Why might we expect a superintelligence to be hostile by default? Why should I worry about superintelligence? Why should we prepare for human-level AI technology now rather than decades down the line when it’s closer? Why think that AI can outperform humans? Why work on AI safety early? Why would great intelligence produce great power? Why would we only get one chance to align a superintelligence? Will AGI be agentic? Will AI learn to be independent from people or will it always ask for our orders? Will an aligned superintelligence care about animals other than humans? Will superintelligence make a large part of humanity unemployable? Will there be a discontinuity in AI capabilities? If so, at what stage? Will there be an AI-assisted "long reflection" and how might it look? Will we ever build a superintelligence? Won’t AI be just like us? Would "warning shots" make a difference and, if so, would they be helpful or harmful? Would AI alignment be hard with deep learning? Would an AI create or maintain suffering because some people want it? Would an aligned AI allow itself to be shut down? Would donating small amounts to AI safety organizations make any significant difference? Would it improve the safety of quantilizers to cut off the top few percent of the distribution? Wouldn't a superintelligence be smart enough not to make silly mistakes in its comprehension of our instructions? Wouldn't a superintelligence be smart enough to know right from wrong? Wouldn't it be a good thing for humanity to die out? Wouldn't it be safer to only build narrow AIs?
Canonical answer :
's Answer to How quickly would the AI capabilities ecosystem adopt promising new advances in AI alignment? Abram Demski's Answer to Maximizers and Satisficers on 2019-08-23T16:05:49 by Martin Verrisin Answer to A lot of concern appears to focus on human-level or “superintelligent” AI. Is that a realistic prospect in the foreseeable future? Answer to AIs aren’t as smart as rats, let alone humans. Isn’t it far too early to be worrying about this kind of thing? Answer to Any AI will be a computer program. Why wouldn't it just do what it's programmed to do? Answer to Are Google, OpenAI, etc. aware of the risk? Answer to Are there types of advanced AI that would be safer than others? Answer to Aren't robots the real problem? How can AI cause harm if it has no ability to directly manipulate the physical world? Answer to Aren’t there some pretty easy ways to eliminate these potential problems? Answer to At a high level, what is the challenge of alignment that we must meet to secure a good future? Answer to Can an AI really be smarter than humans? Answer to Can people contribute to alignment by using proof assistants to generate formal proofs? Answer to Can we add "friendliness" to any artificial intelligence design? Answer to Can we constrain a goal-directed AI using specified rules? Answer to Can we get AGI by scaling up architectures similar to current ones, or are we missing key insights? Answer to Can we program the superintelligence to maximize human pleasure or satisfaction of human desires? Answer to Can we teach a superintelligence a moral code with machine learning? Answer to Can we tell an AI just to figure out what we want and then do that? Answer to Can we test an AI to make sure that it’s not going to take over and do harmful things after it achieves superintelligence? Answer to Can you give an AI a goal which involves “minimally impacting the world”? Answer to Can you stop an advanced AI from upgrading itself? Answer to Can't we just tell an AI to do what we want? Answer to Can’t we just program the superintelligence not to harm us? Answer to Could AI have basic emotions? Answer to Could we program an AI to automatically shut down if it starts doing things we don’t want it to? Answer to Couldn’t we keep the AI in a box and never give it the ability to manipulate the external world? Answer to How can I collect questions for Stampy? Answer to How can I contact the Stampy team? Answer to How can I contribute to Stampy? Answer to How can I join the Stampy dev team? Answer to How can we interpret what all the neurons mean? Answer to How close do AI experts think we are to creating superintelligence? Answer to How could an intelligence explosion be useful? Answer to How could general intelligence be programmed into a machine? Answer to How could poorly defined goals lead to such negative outcomes? Answer to How difficult should we expect alignment to be? Answer to How do I add content from LessWrong / Effective Altruism Forum tag-wikis to Stampy? Answer to How do I form my own views about AI safety? Answer to How do I format answers on Stampy? Answer to How does AI taking things literally contribute to alignment being hard? Answer to How does the stamp eigenkarma system work? Answer to How doomed is humanity? Answer to How fast will AI takeoff be? Answer to How is "intelligence" defined? Answer to How is OpenAI planning to solve the full alignment problem? Answer to How likely is an "intelligence explosion"? Answer to How likely is it that an AI would pretend to be a human to further its goals? Answer to How might AGI kill people? Answer to How might Shard Theory help with alignment? Answer to How might a superintelligence socially manipulate humans? Answer to How might a superintelligence technologically manipulate humans? Answer to How might an "intelligence explosion" be dangerous? Answer to How might an AI achieve a seemingly beneficial goal via inappropriate means? Answer to How might non-agentic GPT-style AI cause an "intelligence explosion" or otherwise contribute to existential risk? Answer to How might things go wrong with AI even without an agentic superintelligence? Answer to How might we get from Artificial General Intelligence to a Superintelligent system? Answer to How quickly could an AI go from the first indications of problems to an unrecoverable disaster? Answer to I want to help out AI alignment without necessarily making major life changes. What are some simple things I can do to contribute? Answer to I want to work on AI alignment. How can I get funding? Answer to I'm interested in working on AI safety. What should I do? Answer to If AI takes over the world how could it create and maintain the infrastructure that humans currently provide? Answer to If I only care about helping people alive today, does AI safety still matter? Answer to If we solve alignment, are we sure of a good future? Answer to Is AI alignment possible? Answer to Is expecting large returns from AI self-improvement just following an exponential trend line off a cliff? Answer to Is it possible to block an AI from doing certain things on the Internet? Answer to Is it possible to code into an AI to avoid all the ways a given task could go wrong, and would it be dangerous to try that? Answer to Is large-scale automated AI persuasion and propaganda a serious concern? Answer to Is the focus on the existential threat of superintelligent AI diverting too much attention from more pressing debates about AI in surveillance and the battlefield, and its potential effects on the economy? Answer to Is there a danger in anthropomorphizing AI’s and trying to understand them in human terms? Answer to Is this about AI systems becoming malevolent or conscious and turning on us? (Plex's Answer to Is this about AI systems becoming malevolent or conscious and turning on us?) Answer to Is this about AI systems becoming malevolent or conscious and turning on us? (Answer to Is this about AI systems becoming malevolent or conscious and turning on us?) Answer to Isn’t AI just a tool like any other? Won’t it just do what we tell it to? Answer to I’d like to get deeper into the AI alignment literature. Where should I look? Answer to Might an "intelligence explosion" never occur? Answer to OK, I’m convinced. How can I help? Answer to Once we notice that a superintelligence given a specific task is trying to take over the world, can’t we turn it off, reprogram it or otherwise correct the problem? Answer to Superintelligence sounds like science fiction. Do people think about this in the real world? Answer to We’re going to merge with the machines so this will never be a problem, right? Answer to What approaches are AI alignment organizations working on? Answer to What are "human values"? Answer to What are "scaling laws" and how are they relevant to safety? Answer to What are Scott Garrabrant and Abram Demski working on? Answer to What are alternate phrasings for? Answer to What are brain-computer interfaces? Answer to What are language models? Answer to What are mesa-optimizers? Answer to What are some AI alignment research agendas currently being pursued? Answer to What are some good books about AGI safety? Answer to What are some good podcasts about AI alignment? Answer to What are some good resources on AI alignment? Answer to What are some objections to the importance of AI alignment? Answer to What are some of the most impressive recent advances in AI capabilities? Answer to What are some specific open tasks on Stampy? Answer to What are the differences between “AI safety”, “AGI safety”, “AI alignment” and “AI existential safety”? Answer to What are the different possible AI takeoff speeds? Answer to What are the ethical challenges related to whole brain emulation? Answer to What are the main sources of AI existential risk? Answer to What are the potential benefits of AI as it grows increasingly sophisticated? Answer to What are the style guidelines for writing for Stampy? Answer to What can I do to contribute to AI safety? Answer to What can we expect the motivations of a superintelligent machine to be? Answer to What does Elon Musk think about AI safety? Answer to What exactly is AGI and what will it look like? Answer to What harm could a single superintelligence do when it took so many humans to build civilization? Answer to What is "HCH"? Answer to What is "biological cognitive enhancement"? Answer to What is "coherent extrapolated volition"? Answer to What is "evidential decision theory"? Answer to What is "friendly AI"? Answer to What is "functional decision theory"? Answer to What is "greater-than-human intelligence"? Answer to What is "hedonium"? Answer to What is "narrow AI"? Answer to What is "superintelligence"? (Scott's Answer to What is "superintelligence"?) Answer to What is "superintelligence"? (Answer to What is "superintelligence"?) Answer to What is "transformative AI"? Answer to What is "whole brain emulation"? Answer to What is AI Safety via Debate? Answer to What is AI safety? Answer to What is Anthropic's approach to LLM alignment? Answer to What is Artificial General Intelligence safety/alignment? Answer to What is Conjecture's Scalable LLM Interpretability research adgenda? Answer to What is Conjecture's epistemology research agenda? Answer to What is GPT-3? Answer to What is Goodhart's law? Answer to What is MIRI’s mission? Answer to What is Stampy's copyright? Answer to What is a "quantilizer"? Answer to What is a "value handshake"? Answer to What is a canonical question on Stampy's Wiki? Answer to What is a duplicate question on Stampy's Wiki? Answer to What is a follow-up question on Stampy's Wiki? Answer to What is an "agent"? Answer to What is an "intelligence explosion"? Answer to What is an "s-risk"? Answer to What is artificial general intelligence safety / AI alignment? Answer to What is causal decision theory? Answer to What is interpretability and what approaches are there? Answer to What is meant by "AI takeoff"? Answer to What is neural network modularity? Answer to What is the "control problem"? Answer to What is the "long reflection"? Answer to What is the "orthogonality thesis"? Answer to What is the "windfall clause"? Answer to What is the Center on Long-Term Risk (CLR) focused on? Answer to What is the DeepMind's safety team working on? Answer to What is the Stampy project? Answer to What is the general nature of the concern about AI alignment? Answer to What is the goal of Simulacra Theory? Answer to What kind of questions do we want on Stampy? Answer to What links are especially valuable to share on social media or other contexts? Answer to What safety problems are associated with whole brain emulation? Answer to What should I read to learn about decision theory? Answer to What should be marked as a canonical answer on Stampy's Wiki? Answer to What sources of information can Stampy use? Answer to What training programs and courses are available for AGI safety? Answer to What would a good future with AGI look like? Answer to What would a good solution to AI alignment look like? Answer to When should I stamp an answer? Answer to When will an intelligence explosion happen? Answer to When will transformative AI be created? Answer to Where can I find all the features of Stampy's Wiki? Answer to Where can I find people to talk to about AI alignment? Answer to Where can I find questions to answer for Stampy? Answer to Where can I learn about AI alignment? Answer to Where can I learn about interpretability? Answer to Who created Stampy? Answer to Who is Nick Bostrom? Answer to Who is Stampy? Answer to Why can't we just make a "child AI" and raise it? Answer to Why can't we just turn the AI off if it starts to misbehave? Answer to Why can't we simply stop developing AI? Answer to Why can’t we just use Asimov’s Three Laws of Robotics? Answer to Why can’t we just use natural language instructions? Answer to Why can’t we just… (Answer to Why can’t we just…) Answer to Why can’t we just… (Plex's Answer to Why can’t we just…) Answer to Why do we expect that a superintelligence would closely approximate a utility maximizer? Answer to Why does AI need goals in the first place? Can’t it be intelligent without any agenda? Answer to Why does AI takeoff speed matter? Answer to Why don't we just not build AGI if it's so dangerous? Answer to Why is AGI safety a hard problem? Answer to Why is AI alignment a hard problem? Answer to Why is AI safety important? Answer to Why is safety important for smarter-than-human AI? Answer to Why might a maximizing AI cause bad outcomes? Answer to Why might a superintelligent AI be dangerous? Answer to Why might contributing to Stampy be worth my time? Answer to Why might people try to build AGI rather than stronger and stronger narrow AIs? Answer to Why might we expect a fast takeoff? Answer to Why might we expect a moderate AI takeoff? Answer to Why might we expect a superintelligence to be hostile by default? Answer to Why not just put it in a box? Answer to Why should I worry about superintelligence? Answer to Why should we prepare for human-level AI technology now rather than decades down the line when it’s closer? Answer to Why think that AI can outperform humans? Answer to Why would great intelligence produce great power? Answer to Why would we only get one chance to align a superintelligence? Answer to Will an aligned superintelligence care about animals other than humans? Answer to Will we ever build a superintelligence? Answer to Would AI alignment be hard with deep learning? Answer to Would an aligned AI allow itself to be shut down? Answer to Would donating small amounts to AI safety organizations make any significant difference? Answer to Would it improve the safety of quantilizers to cut off the top few percent of the distribution? Answer to Wouldn't a superintelligence be smart enough not to make silly mistakes in its comprehension of our instructions? Answer to Wouldn't a superintelligence be smart enough to know right from wrong? (Answer to Wouldn't a superintelligence be smart enough to know right from wrong?) Answer to Wouldn't a superintelligence be smart enough to know right from wrong? (Aprillion's Answer to Wouldn't a superintelligence be smart enough to know right from wrong?) Answer to Wouldn't it be a good thing for humanity to die out? Answers wanting brief versions Aprillion's Answer to 10 Reasons to Ignore AI Safety on 2021-04-16T16:50:15 by cwjalex Aprillion's Answer to 8Dbaybled8D's question on Intro to AI Safety Aprillion's Answer to Agustin Doige's question on Avoiding Positive Side Effects Aprillion's Answer to AkantorJojo's question on Intro to AI Safety Aprillion's Answer to Alliotte Raphael's question on Intro to AI Safety Aprillion's Answer to Can AI be creative? Aprillion's Answer to Daniel Buzovský's question on Where do we go now Aprillion's Answer to Dorda Giovex's question on Real Inner Misalignment Aprillion's Answer to Is it possible to limit an AGI from full access to the internet? Aprillion's Answer to Isn’t AI just a tool like any other? Aprillion's Answer to Jakub Mintal's question on Mesa-Optimizers 2 Aprillion's Answer to Math Magician's question on Are AI Risks like Nuclear Risks? Aprillion's Answer to Mera Flynn's question on The Windfall Clause Aprillion's Answer to Mesa-Optimizers on 2021-02-17T11:05:43 by Lepus Lunaris Aprillion's Answer to Mesa-Optimizers on 2021-02-17T17:36:20 by Robert K Aprillion's Answer to Mesa-Optimizers on 2021-02-18T14:51:23 by Alexander Harris Aprillion's Answer to Mesa-Optimizers on 2021-03-06T00:27:29 by Loz Shamler Aprillion's Answer to Might trying to build a hedonium-maximizing AI be easier and more likely to work than trying for eudaimonia? Aprillion's Answer to Riccardo manfrin's question on Mesa-Optimizers Aprillion's Answer to Samuel Sandeen's question on Intro to AI Safety Aprillion's Answer to Smo1k's question on Mesa-Optimizers 2 Aprillion's Answer to Smrt fašizmu's question on Maximizers and Satisficers Aprillion's Answer to Testing Aprillion's Answer to What's meant by calling an AI "agenty" or "agentlike"? Aprillion's Answer to Wouldn't it be safer to only build narrow AIs? Aprillion's Answer to hedgehog3180's question on The Windfall Clause Atilla's Answer to When writing tools (as a programmer), how do I know what is okay to share and what isn't? Augustus Caesar's Answer to George Michael Sherry's question on Pascal's Mugging Augustus Caesar's Answer to Instrumental Convergence on 2021-02-24T05:56:14 by WILL D Augustus Caesar's Answer to James Tenney's question on Intro to AI Safety Augustus Caesar's Answer to Mesa-Optimizers on 2021-02-19T21:47:02 by milp Augustus Caesar's Answer to Mesa-Optimizers on 2021-02-23T14:17:01 by androkguz Augustus Caesar's Answer to Mesa-Optimizers on 2021-02-24T01:55:01 by frozenbagel16 Augustus Caesar's Answer to Mesa-Optimizers on 2021-02-24T11:12:04 by somename Augustus Caesar's Answer to Mesa-Optimizers on 2021-04-12T11:45:03 by Fanny10000 Augustus Caesar's Answer to Quantilizers on 2020-12-13T22:43:03 by TheWhiteWolf Augustus Caesar's Answer to Quantilizers on 2020-12-14T14:52:51 by fiziwig Augustus Caesar's Answer to Quantilizers on 2020-12-14T18:31:12 by Progressor 4ward Augustus Caesar's Answer to Quantilizers on 2020-12-30T05:52:51 by Mark Augustus Caesar's Answer to René's question on Reward Modeling Augustus Caesar's Answer to Unknown User's question on Intro to AI Safety Beamnode's Answer to Does the importance of AI risk depend on caring about transhumanist utopias? CarlFeynman's Answer to Dismythed & JWA's question on The Orthogonality Thesis Casejp's Answer to Should I engage in political or collective action like signing petitions or sending letters to politicians? Casejp's Answer to What if we put the AI in a box and have a second, more powerful, AI with the goal of preventing the first one from escaping? ChaosAlpha's Answer to Toby Buckley's question on Mesa-Optimizers Chlorokin's Answer to Could emulated minds do AI alignment research? Chlorokin's Answer to What are "coherence theorems" and what do they tell us about AI? Chlorokin's Answer to What if we put the AI in a box and have a second, more powerful, AI with the goal of preventing the first one from escaping? Chlorokin's Answer to What is "Do What I Mean"? Chlorokin's Answer to What is a "pivotal act"? Chlorokin's Answer to Will superintelligence make a large part of humanity unemployable? Command Master's Answer to M A's question on Real Inner Misalignment Command Master's Answer to Seeker.87's question on Real Inner Misalignment CyberByte's Answer to How long will it be until superintelligent AI is created? Damaged's Answer to Bootleg Jones's question on Intro to AI Safety Damaged's Answer to Ceelvain's question on Intro to AI Safety Damaged's Answer to Geoffry Gifari's question on Steven Pinker on AI Damaged's Answer to Henry Goodman's question on Real Inner Misalignment Damaged's Answer to Luka Rapava's question on The Orthogonality Thesis Damaged's Answer to Maccollo's question on Video Title Unknown Damaged's Answer to Matbmp's question on Intro to AI Safety Damaged's Answer to Milan Mašát's question on Video Title Unknown Damaged's Answer to Rob Stringer's question on Mesa-Optimizers Damaged's Answer to Ryan Paton's question on Intro to AI Safety Damaged's Answer to Sophrosynicle's question on Real Inner Misalignment Damaged's Answer to Tim Peterson's question on AI learns to Create Cat Pictures Damaged's Answer to What can AGI do? on 2021-03-04T08:30:55 by Luka Rapava Damaged's Answer to Уэстерн Спай's question on Are AI Risks like Nuclear Risks? Dpaleka's Answer to Are there any courses on technical AI safety topics? Dropinthesea's Answer to What actions can I take in under five minutes to contribute to the cause of AI safety? Dropinthesea's Answer to What is "Do What I Mean"? Evhub's Answer to Mesa-Optimizers on 2021-02-18T00:06:17 by poketopa1234 FLI's FAQ Filip's Answer to Are AI researchers trying to make conscious AI? Filip's Answer to Do you need a PhD to work on AI Safety? Filip's Answer to In "aligning AI with human values", which humans' values are we talking about? Filip's Answer to Isn't it too soon to be working on AGI safety? Filip's Answer to We already have psychopaths who are "misaligned" with the rest of humanity, but somehow we deal with them. Can't we do something similar with AI? Filip's Answer to What about having a human supervisor who must approve all the AI's decisions before executing them? Filip's Answer to What are the differences between AGI, transformative AI and superintelligence? Filip's Answer to Will AI learn to be independent from people or will it always ask for our orders? Gelisam's Answer to ( ͡° ͜ʖ ͡°)'s question on Real Inner Misalignment Gelisam's Answer to Nomentir Alque Nomintio's question on Mesa-Optimizers Gelisam's Answer to Quantilizers on 2020-12-13T22:20:31 by octavio echeverria Gelisam's Answer to Quantilizers on 2020-12-14T05:37:51 by Joshua Hillerup Gelisam's Answer to Quantilizers on 2020-12-14T10:02:10 by Julia Henriques Gelisam's Answer to Quantilizers on 2020-12-14T10:54:03 by Kolop315 Gelisam's Answer to Quantilizers on 2020-12-21T07:14:00 by Serenacula Gelisam's Answer to Quantilizers on 2020-12-21T16:09:27 by MrLeoniu Gelisam's Answer to Quantilizers on 2021-01-02T12:24:40 by jonseah Gelisam's Answer to Quantilizers on 2021-02-09T18:00:54 by Jon Bray Gelisam's Answer to Quantilizers on 2021-04-17T22:22:17.016139 by Unknown JJ Hep's Answer to Quantilizers on 2020-12-24T19:36:39 by Luke Mills Jamespetts's Answer to Maximizers and Satisficers on 2021-02-20T17:58:35 by Donald Engelmann Jeremyg's Answer to What milestones are there between us and AGI? Linnea's Answer to What are OpenAI Codex and GitHub Copilot? Linnea's Answer to What are the ethical challenges related to whole brain emulation? Luke Muehlhauser's Answer to What is "superintelligence"? MIRI's Answer to How long will it be until superintelligent AI is created? MIRI's Answer to Why can’t we just “put the AI in a box” so that it can’t influence the outside world? MIRI's Answer to Why work on AI safety early? Magdalena's Answer to What does generative visualization look like in reinforcement learning? Magdalena's Answer to What is artificial general intelligence safety / AI alignment? Magdalena's Answer to What is the difference between inner and outer alignment? Magdalena's Answer to What subjects should I study at university to prepare myself for alignment research? Matthew1970's Answer to What are the editorial protocols for Stampy questions and answers? Morpheus's Answer to Is it already too late to work on AI alignment? Morpheus's Answer to MattettaM's question on Maximizers and Satisficers Murphant's Answer to Could I contribute by offering coaching to alignment researchers? If so, how would I go about this? Murphant's Answer to Could we tell the AI to do what's morally right? Murphant's Answer to Do AIs suffer? Murphant's Answer to How can I contribute in the area of community building? Murphant's Answer to How likely is it that governments will play a significant role? What role would be desirable, if any? Murphant's Answer to How much resources did the processes of biological evolution use to evolve intelligent creatures? Murphant's Answer to Might an aligned superintelligence force people to have better lives and change more quickly than they want? Murphant's Answer to What are plausible candidates for "pivotal acts"? Murphant's Answer to What are some important examples of specialised terminology in AI alignment? Murphant's Answer to What are the "win conditions"/problems that need to be solved? Murphant's Answer to What is "metaphilosophy" and how does it relate to AI safety? Murphant's Answer to What's especially worrisome about autonomous weapons? Nattfrosten's Answer to What are language models? Nattfrosten's Answer to What are mesa-optimizers? Nico Hill2's Answer to Will we ever build a superintelligence? NotaSentientAI's Answer to Why not just put it in a box? Plex's Answer to 10 Reasons to Ignore AI Safety on 2021-03-08T13:55:51 by james sc Plex's Answer to AkantorJojo's question on Intro to AI Safety Plex's Answer to Aren't robots the real problem? How can AI cause harm if it has no ability to directly manipulate the physical world? Plex's Answer to Gedelijan's question on Real Inner Misalignment Plex's Answer to Heisenmountain B's question on The Orthogonality Thesis Plex's Answer to How can I convince others and present the arguments well? Plex's Answer to How does the field of AI Safety want to accomplish its goal of preventing existential risk? Plex's Answer to How long will it be until superintelligent AI is created? Plex's Answer to How long will it be until transformative AI is created? Plex's Answer to I'm not convinced AI would be a severe threat to humanity. Why are you so sure? Plex's Answer to Instrumental Convergence on 2019-09-07T18:05:31 by Tyler Gust Plex's Answer to Jay Ayerson's question on Intro to AI Safety Plex's Answer to Killer Robot Arms Race on 2021-02-20T14:59:31 by Chiron Plex's Answer to Marc Bollinger's question on Mesa-Optimizers Plex's Answer to Mesa-Optimizers on 2021-02-21T11:55:30 by Piñata Oblongata Plex's Answer to Mesa-Optimizers on 2021-02-22T04:26:43 by Jorel Fermin Plex's Answer to Mesa-Optimizers on 2021-02-22T11:35:00 by Damien Asmodeus Plex's Answer to Mesa-Optimizers on 2021-02-23T07:23:02 by Chrysippus Plex's Answer to Mesa-Optimizers on 2021-02-23T17:49:04 by Will Holmes Plex's Answer to Mesa-Optimizers on 2021-02-23T19:00:23 by aforcemorepowerful Plex's Answer to Mesa-Optimizers on 2021-02-24T04:50:30 by Jonathon Chambers Plex's Answer to Mesa-Optimizers on 2021-02-24T19:05:36 by Solomon Ucko Plex's Answer to Mesa-Optimizers on 2021-03-01T22:38:48 by Iagoba Apellaniz Plex's Answer to Mesa-Optimizers on 2021-03-02T04:09:47 by Harsh Deshpande Plex's Answer to Mesa-Optimizers on 2021-03-11T09:34:01 by HTIDtricky Plex's Answer to Nachis04's question on Intro to AI Safety Plex's Answer to Niels Peppelaar's question on 10 Reasons to Ignore AI Safety Plex's Answer to Quantilizers on 2020-12-13T20:52:21 by owen heckmann Plex's Answer to Quantilizers on 2020-12-13T21:45:39 by DragonSheep Plex's Answer to Quantilizers on 2020-12-13T21:53:33 by Bastiaan Cnossen Plex's Answer to Quantilizers on 2020-12-13T21:53:49 by cmilkau Plex's Answer to Quantilizers on 2020-12-13T21:55:08 by loligesgame Plex's Answer to Quantilizers on 2020-12-13T21:58:44 by Vincent Grange Plex's Answer to Quantilizers on 2020-12-13T22:29:50 by Qwerty and Azerty Plex's Answer to Quantilizers on 2020-12-13T23:11:49 by DragonSheep Plex's Answer to Quantilizers on 2020-12-13T23:59:24 by Nixitur Plex's Answer to Quantilizers on 2020-12-14T02:46:17 by M Kelly Plex's Answer to Quantilizers on 2020-12-14T03:52:16 by James Barclay Plex's Answer to Quantilizers on 2020-12-14T05:12:00 by Jeremy Hoffman Plex's Answer to Quantilizers on 2020-12-14T06:05:58 by Paulo Van Huffel Plex's Answer to Quantilizers on 2020-12-14T07:32:09 by Taras Pylypenko Plex's Answer to Quantilizers on 2020-12-14T22:59:02 by Panzerkampfwagen Plex's Answer to Quantilizers on 2020-12-15T01:44:10 by AdibasWakfu Plex's Answer to Quantilizers on 2020-12-15T12:08:55 by Life Happens Plex's Answer to Quantilizers on 2020-12-15T23:08:13 by Plex's Answer to Quantilizers on 2020-12-19T13:13:33 by SocialDownclimber Plex's Answer to Quantilizers on 2020-12-19T13:37:50 by Alex Webb Plex's Answer to Quantilizers on 2020-12-19T18:22:43 by Yezpahr Plex's Answer to Quantilizers on 2020-12-25T18:03:37 by Timothy Hansen Plex's Answer to Quantilizers on 2020-12-30T17:26:14 by Sean Pedersen Plex's Answer to Quantilizers on 2021-01-11T09:32:08 by Underrated1 Plex's Answer to Quantilizers on 2021-02-18T15:39:07 by Spoon Of Doom Plex's Answer to Quantilizers on 2021-02-19T12:19:50 by Shantanu Ojha Plex's Answer to Quantilizers on 2021-02-20T12:55:45 by Marcus Antonius Plex's Answer to Quantilizers on 2021-02-22T21:54:31 by James Petts Plex's Answer to Ranibow Sprimkle's question on Instrumental Convergence Plex's Answer to Reitze Jansen's question on Mesa-Optimizers Plex's Answer to Safe Exploration on 2020-12-06T21:48:38 by Rares Rotar Plex's Answer to Samuel Hvidager's question on Intro to AI Safety Plex's Answer to TackerTacker's question on Mesa-Optimizers 2 Plex's Answer to Testfollowup2 Plex's Answer to Testing Plex's Answer to Testing rename second? Plex's Answer to The Orthogonality Thesis on 2019-04-20T04:57:23 by Enciphered Plex's Answer to The Windfall Clause on 2020-07-07T07:10:36 by boarattackboar Plex's Answer to The Windfall Clause on 2020-12-13T14:49:52 by the Decoy Plex's Answer to Tommy karrick's question on Mesa-Optimizers Plex's Answer to Use of Utility Functions on 2017-04-27T20:50:28 by William Dye Plex's Answer to Use of Utility Functions on 2021-02-26T11:44:06 by Michael Moran Plex's Answer to What is "agent foundations"? Plex's Answer to What is a verified account on Stampy's Wiki? Plex's Answer to What is everyone working on in AI alignment? Plex's Answer to What’s a good AI alignment elevator pitch? Plex's Answer to Will there be a discontinuity in AI capabilities? If so, at what stage? Plex's Answer to jhjkhgjhfgjg jgjyfhdhbfjhg's question on Mesa-Optimizers 2 Plex's second Answer to Test's question? Plex2's Answer to Testing rename second? QZ's Answer to Where can I find mentorship and advice for becoming a researcher? QueenDaisy's Answer to Are any major politicians concerned about this? QueenDaisy's Answer to Might an aligned superintelligence force people to "upload" themselves, so as to more efficiently use the matter of their bodies? QueenDaisy's Answer to What could a superintelligent AI do, and what would be physically impossible even for it? Quintin Pope's Answer to Will superintelligence make a large part of humanity unemployable? Redshift's Answer to In "aligning AI with human values", which humans' values are we talking about? Robert hildebrandt's Answer to Quantilizers on 2020-12-14T01:27:48 by Moleo Robert hildebrandt's Answer to Quantilizers on 2020-12-14T03:13:46 by SlimThrull Robert hildebrandt's Answer to Quantilizers on 2020-12-14T03:47:06 by snigwithasword Robert hildebrandt's Answer to Quantilizers on 2020-12-14T04:04:30 by Noah McCann Robert hildebrandt's Answer to Quantilizers on 2020-12-14T11:23:40 by illesizs Robert hildebrandt's Answer to Quantilizers on 2020-12-16T06:22:32 by Chrysippus Robert.hildebrandt's Answer to Mesa-Optimizers on 2021-02-18T00:28:56 by LoliShocks Robert.hildebrandt's Answer to Mesa-Optimizers on 2021-02-18T05:37:03 by Irun S Robert.hildebrandt's Answer to Mesa-Optimizers on 2021-02-18T13:35:33 by Peter Smythe Robert.hildebrandt's Answer to WNJ: Think of AGI like a Corporation? on 2021-02-21T09:03:12 by Chedim Robertskmiles's Answer to A Commenter's question on What can AGI do? Robertskmiles's Answer to AI Safety Gridworlds 2 on 2020-06-02T00:45:31 by Wylliam Judd Robertskmiles's Answer to Alan W's question on Intro to AI Safety Robertskmiles's Answer to Alessandrə Rustichelli's question on Intro to AI Safety Robertskmiles's Answer to Are expert surveys on AI safety available? Robertskmiles's Answer to Avoiding Negative Side Effects on 2020-11-17T01:48:43 by Neological Gamer Robertskmiles's Answer to Channel Introduction on 2021-04-07T23:33:04 by Robert Miles Robertskmiles's Answer to Instrumental Convergence on 2020-05-18T23:25:44 by phil guer Robertskmiles's Answer to Is merging with AI through brain-computer interfaces a potential solution to safety problems? Robertskmiles's Answer to Killer Robot Arms Race on 2020-06-06T11:20:21 by DaVince21 Robertskmiles's Answer to Maor Eitan's question on Intro to AI Safety Robertskmiles's Answer to Maximizers and Satisficers on 2019-09-01T08:11:48 by Paper Benni Robertskmiles's Answer to Mesa-Optimizers on 2021-02-19T08:59:09 by valberm Robertskmiles's Answer to Mesa-Optimizers on 2021-02-25T10:23:18 by RaukGorth Robertskmiles's Answer to NINJA NAJM's question on Avoiding Negative Side Effects Robertskmiles's Answer to Quantilizers on 2020-12-13T22:12:28 by Markus Johansson Robertskmiles's Answer to Quantilizers on 2020-12-14T18:12:54 by Blah Blah Robertskmiles's Answer to Quantilizers on 2020-12-15T18:18:01 by mrsuperguy2073 Robertskmiles's Answer to Quantilizers on 2020-12-25T23:07:12 by Peter Franz Robertskmiles's Answer to Quantilizers on 2021-02-19T12:19:50 by Shantanu Ojha Robertskmiles's Answer to Quantilizers on 2021-02-24T23:54:57 by Nathan Kouvalis Robertskmiles's Answer to Rob Sokolowski's question on AI Safety Gridworlds 2 Robertskmiles's Answer to Steven Pinker on AI on 2020-08-23T02:40:09 by Xystem 4 Robertskmiles's Answer to Superintelligence Mod for Civilization V on 2019-04-11T22:16:10 by Mateja Petrovic Robertskmiles's Answer to The Orthogonality Thesis on 2019-04-18T19:04:56 by Jan Bam Robertskmiles's Answer to The Orthogonality Thesis on 2020-10-13T22:10:08 by Juan Pablo Garibotti Arias Robertskmiles's Answer to The Orthogonality Thesis on 2021-02-21T04:49:23 by peterbrehmj Robertskmiles's Answer to Uiytt's question on Intro to AI Safety Robertskmiles's Answer to Use of Utility Functions on 2020-09-23T08:51:14 by Amaar Quadri Robertskmiles's Answer to What can AGI do? on 2020-03-30T09:18:52 by Jade Gorton Robertskmiles's Answer to What can AGI do? on 2020-12-19T06:19:12 by Firaro Robertskmiles's Answer to Where do we go now on 2020-05-12T20:06:14 by Musthegreat 94 Robertskmiles's Answer to ZT1ST's question on Instrumental Convergence Robertskmiles's Answer to Ian's question on Maximizers and Satisficers RoseMcClelland's Answer to How do you figure out model performance scales? RoseMcClelland's Answer to How does MIRI communicate their view on alignment? RoseMcClelland's Answer to How is Beth Barnes evaluating LM power seeking? RoseMcClelland's Answer to How is the Alignment Research Center (ARC) trying to solve Eliciting Latent Knowledge (ELK)? RoseMcClelland's Answer to How would we align an AGI whose learning algorithms / cognition look like human brains? RoseMcClelland's Answer to How would you explain the theory of Infra-Bayesianism? RoseMcClelland's Answer to What are Encultured working on? RoseMcClelland's Answer to What does Evan Hubinger think of Deception + Inner Alignment? RoseMcClelland's Answer to What does MIRI think about technical alignment? RoseMcClelland's Answer to What does Ought aim to do? RoseMcClelland's Answer to What does the scheme Externalized Reasoning Oversight involve? RoseMcClelland's Answer to What is Aligned AI / Stuart Armstrong working on? RoseMcClelland's Answer to What is Conjecture, and what is their team working on? RoseMcClelland's Answer to What is David Krueger working on? RoseMcClelland's Answer to What is Dylan Hadfield-Menell's thesis on? RoseMcClelland's Answer to What is FAR's theory of change? RoseMcClelland's Answer to What is Future of Humanity Instititute working on? RoseMcClelland's Answer to What is John Wentworth's plan? RoseMcClelland's Answer to What is Refine? RoseMcClelland's Answer to What is Truthful AI's approach to improve society? RoseMcClelland's Answer to What is an adversarial oversight scheme? RoseMcClelland's Answer to What is the Center for Human Compatible AI (CHAI)? RoseMcClelland's Answer to What is the purpose of the Visible Thoughts Project? RoseMcClelland's Answer to What language models are Anthropic working on? RoseMcClelland's Answer to What other organizations are working on technical AI alignment? RoseMcClelland's Answer to What projects are CAIS working on? RoseMcClelland's Answer to What projects are Redwood Research working on? RoseMcClelland's Answer to What work is Redwood doing on LLM interpretability? RoseMcClelland's Answer to Who is Jacob Steinhardt and what is he working on? RoseMcClelland's Answer to Who is Sam Bowman and what is he working on? Self-modification and wireheading Severin's Answer to How can I be a more productive student/researcher? Severin's Answer to Isn't the real concern AI being misused by terrorists or other bad actors? Severin's Answer to What are the leading theories in moral philosophy and which of them might be technically the easiest to encode into an AI? SlimeBunnyBat's Answer to 5astelija's question on Mesa-Optimizers 2 SlimeBunnyBat's Answer to Andy Gee's question on Mesa-Optimizers 2 SlimeBunnyBat's Answer to Ansatz66's question on Intro to AI Safety SlimeBunnyBat's Answer to Arthur Wittmann's question on Killer Robot Arms Race SlimeBunnyBat's Answer to Ben Crulis's question on Mesa-Optimizers SlimeBunnyBat's Answer to Etienne Maheu's question on Intro to AI Safety SlimeBunnyBat's Answer to IkarusKK's question on Real Inner Misalignment SlimeBunnyBat's Answer to Isn't the real concern technological unemployment? SlimeBunnyBat's Answer to Marcelo Pinheiro's question on Real Inner Misalignment SlimeBunnyBat's Answer to Marie Rentergent's question on Mesa-Optimizers 2 SlimeBunnyBat's Answer to Mesa-Optimizers on 2021-02-17T08:00:03 by James Crewdson SlimeBunnyBat's Answer to Mesa-Optimizers on 2021-02-17T10:26:59 by Koro SlimeBunnyBat's Answer to Midhunraj R's question on Quantilizers SlimeBunnyBat's Answer to Nerd Herd's question on Real Inner Misalignment SlimeBunnyBat's Answer to Oliver Bergau's question on Quantilizers SlimeBunnyBat's Answer to Robert Tuttle's question on Mesa-Optimizers 2 SlimeBunnyBat's Answer to Sigmata0's question on Intro to AI Safety SlimeBunnyBat's Answer to SlimeBunnyBat's Answer to Drekpaprika's question on Intro to AI Safety SlimeBunnyBat's Answer to Son of a Beech's question on The Orthogonality Thesis SlimeBunnyBat's Answer to Stellar Lake System's question on The Orthogonality Thesis Social Christancing's Answer to G T's question on Mesa-Optimizers Social Christancing's Answer to Mesa-Optimizers on 2021-02-17T12:57:20 by X3 KJ Social Christancing's Answer to Mesa-Optimizers on 2021-02-23T01:55:12 by Sebastian Gramsz Social Christancing's Answer to Mesa-Optimizers on 2021-02-28T20:30:04 by DodoDojo Social Christancing's Answer to Quantilizers on 2021-02-22T13:59:07 by Marshall White Social Christancing's Answer to Siranut usawasutsakorn's question on Mesa-Optimizers 2 Social Christancing's Answer to Socially unacceptable's question on The Orthogonality Thesis Stargate9000's Answer to Mesa-Optimizers on 2021-02-28T14:11:32 by andybaldman Stargate9000's Answer to Mesa-Optimizers on 2021-03-13T23:08:08 by Tomasz Rogala Stargate9000's Answer to Quantilizers on 2021-02-18T15:39:07 by Spoon Of Doom Stargate9000's Answer to Quantilizers on 2021-03-09T18:02:04 by Blackmage89 Stargate9000's Answer to The Orthogonality Thesis on 2021-02-27T23:46:44 by Stellar Lake System Sudonym's Answer to Famitory's question on Intro to AI Safety Sudonym's Answer to Instrumental Convergence on 2020-06-09T18:06:22 by Yuval Sudonym's Answer to Iterated Distillation and Amplification on 2021-01-07T19:03:28 by Keenan Pepper Sudonym's Answer to Mesa-Optimizers on 2021-03-09T20:02:31 by Corman Sudonym's Answer to Mesa-Optimizers on 2021-03-09T21:29:31 by Metsuryu Sudonym's Answer to Quantilizers on 2020-12-13T22:53:59 by J M Sudonym's Answer to Quantilizers on 2020-12-14T01:34:53 by boobshart Sudonym's Answer to Quantilizers on 2020-12-15T05:08:10 by DarkestMirrored Sudonym's Answer to Quantilizers on 2020-12-15T09:14:48 by Samuel Woods Sudonym's Answer to Quantilizers on 2020-12-15T16:27:49 by Nutwit Sudonym's Answer to Quantilizers on 2020-12-16T05:57:09 by Adrian Regenfuß Sudonym's Answer to Quantilizers on 2020-12-18T00:46:12 by Wilco Verhoef Sudonym's Answer to Quantilizers on 2020-12-25T04:37:20 by Daniel MK Sudonym's Answer to Quantilizers on 2020-12-25T11:41:42 by PianoShow Sudonym's Answer to Quantilizers on 2020-12-26T16:22:16 by Songbird Sudonym's Answer to Quantilizers on 2021-01-02T15:40:56 by wertyuiop Sudonym's Answer to Quantilizers on 2021-01-05T19:27:35 by kade99TV Sudonym's Answer to Quantilizers on 2021-01-09T16:33:43 by Stephen Sudonym's Answer to Reward Hacking Reloaded on 2020-10-26T01:46:26 by Julian Danzer Sudonym's Answer to Steven Pinker on AI on 2020-05-13T19:07:46 by kilroy1964 Sudonym's Answer to The Orthogonality Thesis on 2019-04-15T17:07:34 by echoes Sudonym's Answer to The Orthogonality Thesis on 2020-12-27T23:05:56 by Miguel Borromeo Sudonym's Answer to Uberchops's question on Quantilizers Sudonym's Answer to WNJ: Think of AGI like a Corporation? on 2020-06-05T00:00:24 by Clayton Voges Sudonym's Answer to What does alignment failure look like? TJ6K's Answer to What beneficial things would an aligned superintelligence be able to do? TapuZuko's Answer to Is the question of whether we're living in a simulation relevant to AI safety? If so, how? TapuZuko's Answer to Isn't the real concern autonomous weapons? TapuZuko's Answer to Might an aligned superintelligence immediately kill everyone and then go on to create a "hedonium shockwave"? Tinytitan's Answer to Could we get significant biological intelligence enhancements long before AGI? U8k's Answer to onje berdy's question on Real Inner Misalignment Yaakov's Answer to What are the different versions of decision theory? Yaakov's Answer to Which organizations are working on AI alignment? Yevgeniy Andreyevich's Answer to Lapis Salamander's question on Intro to AI Safety Yevgeniy Andreyevich's Answer to Rich Traube's question on WNJ: Think of AGI like a Corporation? Yevgeniy Andreyevich's Answer to afla light's question on 10 Reasons to Ignore AI Safety Yevgeniy's Answer to Ted Archer's question on Maximizers and Satisficers
Asked on Discord? Has this been posted to a Discord channel for the community to try answering?
Not a question? Is this not a question? (e.g. someone trying to chat with Stampy, or just has a URL with a ? in it)
Asker:
Date: When was this question asked?
January February March April May June July August September October November December
Origin: Where is this question from?
LessWrong
Asked on video:
"Don't Fear The Terminator" - Yann LeCun on Facebook 10 Reasons to Ignore AI Safety 9 Examples of Specification Gaming A Response to Steven Pinker on AI AI Safety Gridworlds AI Safety at EAGlobal2017 Conference AI That Doesn't Try Too Hard - Maximizers and Satisficers AI learns to Create ̵K̵Z̵F̵ ̵V̵i̵d̵e̵o̵s̵ Cat Pictures: Papers in Two Minutes Are AI Risks like Nuclear Risks? Avoiding Negative Side Effects: Concrete Problems in AI Safety part 1 Avoiding Positive Side Effects: Concrete Problems in AI Safety part 1.5 Channel Introduction Deceptive Misaligned Mesa-Optimisers? It's More Likely Than You Think... Empowerment: Concrete Problems in AI Safety part 2 Experts' Predictions about the Future of AI Friend or Foe? AI Safety Gridworlds extra bit How to Keep Improving When You're Better Than Any Teacher - Iterated Distillation and Amplification Intelligence and Stupidity: The Orthogonality Thesis Intro to AI Safety, Remastered Is AI Safety a Pascal's Mugging? MAXIMUM OVERGEORGIA Much Better Stampy Test Video PC Build Video! Predicting AI: RIP Prof. Hubert Dreyfus Quantilizers: AI That Doesn't Try Too Hard Reading and Commenting On Pinker's Article Respectability Reward Hacking Reloaded: Concrete Problems in AI Safety Part 3.5 Reward Hacking: Concrete Problems in AI Safety Part 3 Safe Exploration: Concrete Problems in AI Safety Part 6 Scalable Supervision: Concrete Problems in AI Safety Part 5 Sharing the Benefits of AI: The Windfall Clause Status Report Superintelligence Mod for Civilization V The OTHER AI Alignment Problem: Mesa-Optimizers and Inner Alignment The other "Killer Robot Arms Race" Elon Musk should worry about Training AI Without Writing A Reward Function, with Reward Modelling Untitled2 We Were Right! Real Inner Misalignment What Can We Do About Reward Hacking?: Concrete Problems in AI Safety Part 4 What can AGI do? I/O and Speed What's the Use of Utility Functions? Where do we go now? Why Not Just: Raise AI Like Kids? Why Not Just: Think of AGI Like a Corporation? Why Would AI Want to do Bad Things? Instrumental Convergence