(I expect that the point of this post is already obvious to many of the people reading it. Nevertheless, I believe that it is good to mention important things even if they seem obvious.)

OpenAI, DeepMind, Anthropic, and other AI organizations focused on capabilities, should shut down. This is what would maximize the utility of pretty much everyone, including the people working inside of those organizations.

Let's call Powerful AI ("PAI") an AI system capable of either:

  • Steering the world towards what it wants hard enough that it can't be stopped.
  • Killing everyone "un-agentically", eg by being plugged into a protein printer and generating a supervirus.

and by "aligned" (or "alignment") I mean the property of a system that, when it has the ability to {steer the world towards what it wants hard enough that it can't be stopped}, what it wants is nice things and not goals that entail killing literally everyone (which is the default).

We do not know how to make a PAI which does not kill literally everyone. OpenAI, DeepMind, Anthropic, and others are building towards PAI. Therefore, they should shut down, or at least shut down all of their capabilities progress and focus entirely on alignment.

"But China!" does not matter. We do not know how to build PAI that does not kill literally everyone. Neither does China. If China tries to build AI that kills literally everyone, it does not help if we decide to kill literally everyone first.

"But maybe the alignment plan of OpenAI/whatever will work out!" is wrong. It won't. It might work if they were careful enough and had enough time, but they're going too fast and they'll simply cause literally everyone to be killed by PAI before they would get to the point where they can solve alignment. Their strategy does not look like that of an organization trying to solve alignment. It's not just that they're progressing on capabilities too fast compared to alignment; it's that they're pursuing the kind of strategy which fundamentally gets to the point where PAI kills everyone before it gets to saving the world.

Yudkowsky's Six Dimensions of Operational Adequacy in AGI Projects describes an AGI project with adequate alignment mindset is one where

The project has realized that building an AGI is mostly about aligning it. Someone with full security mindset and deep understanding of AGI cognition as cognition has proven themselves able to originate new deep alignment measures, and is acting as technical lead with effectively unlimited political capital within the organization to make sure the job actually gets done. Everyone expects alignment to be terrifically hard and terribly dangerous and full of invisible bullets whose shadow you have to see before the bullet comes close enough to hit you. They understand that alignment severely constrains architecture and that capability often trades off against transparency. The organization is targeting the minimal AGI doing the least dangerous cognitive work that is required to prevent the next AGI project from destroying the world. The alignment assumptions have been reduced into non-goal-valent statements, have been clearly written down, and are being monitored for their actual truth.

(emphasis mine)

Needless to say, this is not remotely what any of the major AI capabilities organizations look like.

At least Anthropic didn't particularly try to be a big commercial company making the public excited about AI. Making the AI race a big public thing was a huge mistake on OpenAI's part, and is evidence that they don't really have any idea what they're doing.

It does not matter that those organizations have "AI safety" teams, if their AI safety teams do not have the power to take the one action that has been the obviously correct one this whole time: Shut down progress on capabilities. If their safety teams have not done this so far when it is the one thing that needs done, there is no reason to think they'll have the chance to take whatever would be the second-best or third-best actions either.

This isn't just about the large AI capabilities organizations. I expect that there's plenty of smaller organizations out there headed towards building unaligned PAI. Those should shut down too. If these organizations exist, it must be because the people working there think they have a real chance of making some progress towards more powerful AI. If they are, then that's real damage to the probability that anyone at all survives, and they should shut down as well in order to stop doing that damage. It does not matter if you think you have only a small negative impact on the probability that anyone survives at all — the actions that maximize your utility are the ones that decrease the probability that PAI kills literally everyone, even if it's just by a small amount.

Organizations which do not directly work towards PAI but provides services that are instrumental to it — such as EleutherAI, HuggingFace, etc — should also shut down. It does not matter if your work only contributes "somewhat" to PAI killing literally everyone. If the net impact of your work is a higher probability that PAI kills literally everyone, you should "halt, melt, and catch fire".

If you work at any of those organizations, your two best options to maximize your utility are to find some way to make that organization slower at getting to PAI (eg by advocating for more safety checks that slow down progress, and by yourself being totally unproductive at technical work), or to quit. Stop making excuses and start taking the correct actions. We're all in this together. Being part of the organization that kills everyone will not do much for you — all you get is a bit more wealth-now, which is useless if you're dead and useless if alignment is solved and we get utopia.

See also:

22

9
15

Reactions

9
15

More posts like this

Comments22
Sorted by Click to highlight new comments since:

I think it's useful for people to express opinions on the forum, but this post didn't quite hit the mark, in my view.

The post makes a number of fairly strong claims, but some of them (including important ones) have little to no justification. Examples:

  • that the default goal of an AI system is "literally killing everyone"
  • ' "But maybe the alignment plan of OpenAI/whatever will work out!" is wrong. It won't. '

If you didn't want to lengthen the post by going over lengthy justifications which have already been made elsewhere, I think it would have been reasonably to link to other places where those claims have been justified.

I’ll go further and say that I think those two claims are widely believed by many in the AI safety world (in which I count myself) with a degree of confidence that goes way beyond what can be justified by any argument that has been provided by anyone, anywhere, and I think this is a huge epistemic failure of that part of the AI safety community.

I strongly downvoted the OP for making these broad, sweeping, controversial claims as if they are established fact and obviously correct, as opposed to one possible way the world could be which requires good arguments to establish, and not attempting any serious understanding of and engagement with the viewpoints of people who disagree that these organizations shutting down would be the best thing for the world.

I would like the AI safety community to work much harder on its epistemic standards.

There has already been much written on this, enough for there to be a decent level of consensus (which indeed there is here (EAF/LW)).

These essays are well known and I'm aware of basically all of them. I deny that there's a consensus on the topic, that the essays you link are representative of the range of careful thought on the matter, or that the arguments in these essays are anywhere near rigorous enough to meet my criterion: justifying the degree of confidence expressed in the OP (and some of the posts you link).

I've not come across any arguments that debunk the risk in anywhere near the same rigour (and I still have a $1000 bounty open here). Please link to the "careful thought on the matter" from the other side that you mention (or add here). I'm with Richard Ngo when he says:

I'm often cautious when publicly arguing that AGI poses an existential risk, because our arguments aren't as detailed as I'd like. But I should remember that the counterarguments are *much* worse - I've never seen a plausible rebuttal to the core claims. That's terrifying.

You seem to be lumping people like Richard Ngo, who is fairly epistemically humble, in with people who are absolutely sure that the default path leads to us all dying. It is only the latter that I'm criticizing.

I agree that AI poses an existential risk, in the sense that it is hard to rule out that the default path poses a serious chance of the end of civilization. That's why I work on this problem full-time.

I do not agree that it is absolutely clear that default instrumental goals of an AGI entail it killing literally everyone, as the OP asserts.

(I provide some links to views dissenting from this extreme confidence here.)

I do not agree that it is absolutely clear that the default goal of an AGI is for it to kill literally everyone, as the OP asserts.

The OP says 

goals that entail killing literally everyone (which is the default)

[my emphasis in bold]. This is a key distinction. No one is saying that the default goal will be killing humans; the whole issue is one of collateral damage - it will end up with (to us) arbitrary goals that result in convergent intstrumental goals that lead to us all dead as collateral damage (e.g. turning the planet into "computronium", or dismantling the Sun for energy).

Sure, I understand that it’s a supposed default instrumental goal and not a terminal goal. Sorry that my wording didn’t make that distinction clear. I’ve now edited it to do so, but I think my overall points stand.

It's not even (necessarily) a default instrumental goal. It's collateral damage as the result of other instrumental goals. It may just go straight for dismantling the Sun, knowing that we won't be able to stop it. Or straight for ripping apart the planet with nanobots (no need for a poison everyone simultaneously step).

Fair enough, I edited it again. I still think the larger points stand unchanged.

No one is saying p(doom) is 100%, but there is good reason to think that it is 50% or more - that the default outcome of AGI is doom. It doesn't default to somehow everything being ok. To alignment solving itself, or the alignment that has been done today (or by 2030) being enough if we get a foom tomorrow (by 2030). I've not seen any compelling argument to that effect.

Thanks for the links. I think a lot of the problem with the proposed solutions is that they don't scale to ASI, and aren't water tight. Having 99.999999% alignment in the limit of ASI performing billions of actions a minute still means everyone dead after a little while. RHLF'd GPT-4 is only safe because it is weak. 

Alignment at the level that is typical human-to-humanity, or what is represented by "common sense" that can be picked up from training data, is still nowhere near sufficient. Uplifting any given human to superintelligence would also lead to everyone dead before too long, due to the massive power imbalance, even if it's just by accident ("whoops I was just doing some physics experiments; didn't think that would happen"; "I thought it would be cool if everyone became a post-human hive mind; I thought they'd like it").

And quite apart from alignment, we still need to eliminate catastrophic risks from misuse (jailbreaks, open sourced unaligned base model weights) and coordination failure (how to avoid chaos when everyone is wishing for different things from their genies). Those alone are enough to justify shutting it all down now.

It’s easy to ask other people to do work, but do you have things to read on the range of careful thought on this topic?

[This comment is no longer endorsed by its author]Reply

To be clear, mostly I'm not asking for "more work", I'm asking people to use much better epistemic hygiene. I did use the phrase "work much harder on its epistemic standards", but by this I mean please don't make sweeping, confident claims as if they are settled fact when there's informed disagreement on those subjects.

Nevertheless, some examples of the sort of informed disagreement I'm referring to:

  • The mere existence of many serious alignment researchers seriously optimistic about scalable oversight methods such as debate.
  • This post by Matthew Barnett arguing we've been able to specify values much more successfully than MIRI anticipated.
  • Shard theory, developed mostly by Alex Turner and Quintin Pope, calling into question the utility argmaxer framework which has been used to justify many historical concerns about instrumental convergence leading to AI takeover.
  • This comment by me arguing ChatGPT is pretty aligned compared to MIRI's historical predictions, because it does what we mean and not what we say.
  • A detailed set of objections from Quintin Pope to Eliezer's views, which Eliezer responded to by saying it's "kinda long", and engaged with extremely superficially before writing it off.
  • This by Stuhlmüller and Byun, as well as many other articles by others, arguing that process oversight is a viable alignment strategy, converging with rather than opposing capabilities.

Notably, the extreme doomer contingent has largely failed even to understand, never mind engage with, some of these arguments, frequently lazily pattern-matching and misrepresenting them as more basic misconceptions. A typical example is thinking Matthew Barnett and I have been saying that GPT understanding human values is evidence against the MIRI/doomer worldview (after all, "the AI knows what you want but does not care, as we've said all along"), when in fact we're saying there's evidence we have actually pointed GPT successfully at those values.

It's fine if you have a different viewpoint. Just don't express that viewpoint as if it's self-evidently right when there's serious disagreement on the matter among informed, thoughtful people. An article like the OP which claims that labs should shut down should at least try to engage with the views of someone who thinks the labs should not shut down, and not just pretend such people are fools unworthy of mention.

Aw shit, I'm very sorry for how I phrased it! I realized that it sounds like I'm digging at you. To be clear, I was asking for any links to discussions of alternative views, because I'm curious and haven't heard them. What I meant is that it's very easy for me to ask you to do work, by summarizing other people's opinions. So it was a caveat that you don't have to elaborate too much.

Going to retract that comment to prevent misunderstanding. Thanks for the links.

Oh lol, thanks for explaining! Sorry for misunderstanding you. (It's a pretty amusing misunderstanding though, I think you'd agree.)

Here's my attempt at a concise explanation for why the default outcome of AGI is doom.

We do not know how to make a PAI which does not kill literally everyone.

 

We don't know how to make a PAI that does kill literally everyone either. What would the world have to look like for you to be pro more AI research and development?

It's pretty much just a point of throwing more money (compute and data) at it now. Current systems are only not killing everyone because they are weak.

Executive summary: AI capabilities organizations like OpenAI and DeepMind should immediately halt progress and shut down because they do not know how to build powerful AI systems that are safe and beneficial.

Key points:

  1. Current AI capabilities organizations are progressing quickly on building powerful AI without knowing how to ensure it is safe and beneficial. This risks catastrophe if deployed.
  2. Aligning AI goals with human values is extremely difficult. Current organizations are not prioritizing this properly and are likely to deploy unsafe systems.
  3. Shutting down and halting progress is the only way to prevent uncontrolled development of AI systems that could cause mass harm.
  4. Even small contributions to capabilities over safety increase the risk of catastrophe. Researchers and engineers at these organizations should quit or slow progress substantially.
  5. Related organizations contributing tools and services to unsafe AI development should also shut down.
  6. No current organization has shown the capability to properly build safe advanced AI before uncontrolled AI emerges.

 

 

This comment was auto-generated by the EA Forum Team. Feel free to point out issues with this summary by replying to the comment, and contact us if you have feedback.

You're of course entitled to your own opinion but this post mostly comes off as condescending with you claiming to know what's best for other people, in particular for them to quit their prestigious jobs (rather than, say, quit smoking).

  1. This neglects a considerable amount of my probability mass that says "ASI is dangerous", due to not considering the possibility of an Oracle ASI, or otherwise one with bad outcomes that would be worsened by China's AI plausibly getting to ASI before us.
  2. For a further reason "But China!" does matter, consider the greatly reduced bargaining position under that scenario. Much easier I think (with admittedly no understanding global-power bargaining dynamics) that building international agreements is easier when costs aren't to the competitive disadvantage of the opposing side.
  3. I'm not convinced that alignment is not ~90% capabilities. That Open AI and Anthropic are at least somewhat dedicated to explicitly pursuing alignment also shouldn't be taken for granted. 

Thank you for saying this. 

Curated and popular this week
Relevant opportunities