“Aligned to whom?”

This is a thorny question. Regardless of the answer, we need to halt development.

If humanity builds a superintelligence someday, we should make sure that it’s “aligned” with human values. But with the values of which humans, exactly? People disagree enormously about right and wrong, about religion, about social norms, about policy tradeoffs, etc.

At present, this question is moot. Humanity isn’t able to get any particular goals into an AI, so it doesn’t matter whether there’s disagreement about which goals would be ideal. As we’ve argued at length, rushing to build superintelligence would get all of us killed. Humanity disagrees about a lot of things, but most people don’t disagree about whether the destruction of all life on Earth is a good thing.

The problem of which values exactly should be loaded into an AI seems like a thorny problem. It’s a problem that, frankly, we would love to have. What we face instead is a different and far worse problem.

We don’t need to agree at all about “aligned to whom?” (or even about whether humanity should ever build superintelligence) in order to coordinate on an international ban, on the brutally simple grounds that we’re going to die otherwise. There are an endless number of interesting philosophical questions occasioned by AI, but if we let ourselves get unduly distracted by these, we’re liable to get our kids killed in the process.

In practical terms, our advice to world leaders is:

  • Separate out the question of “Should we rush ahead to build superintelligence?” from the question of “If we somehow had a way to build superintelligence safely, what should we do with it?” and focus on the first question first. The first question is the urgent one, and the one that’s actionable today. The second question may be important to address someday, but at present it’s a trap, because it encourages thinking of superintelligence as a prize. Falsely believing that the first person who builds a superintelligence gets to decide what to do with it would lead us into a suicide race.

    ASI is a suicide button, and not a genie in a lamp. When someone creates a superintelligence, they don’t thereby “have” that superintelligence. Rather, the superintelligence they just created has a planet.

  • If for some reason you do feel the need in the future to broach the topic of “How should humanity someday use superintelligence, if we’re ever in a position to do so?”, we strongly recommend avoiding proposals or ideas that would encourage other actors to race (or that would otherwise encourage nations to reject or violate any future international agreements on superintelligence). Anything like a winner-takes-all dynamic has enormous potential to endanger the world.

    There exist proposals for managing the thorny question of “aligned to whom” in a relatively universalistic way that attempts to be fair to all potential stakeholders and that does not incentivize racing over the finish line — e.g., the proposal of aligning an AI to pursue the coherent extrapolated volition of all humankind.* But even there, there’s endless potential for people to argue over the principles and tradeoffs involved, as well as the thorny implementation details. Those arguments would be important to resolve in a world where humanity had figured out how to precisely and robustly aim a superintelligence, but putting them front and center today wildly mischaracterizes the actual tradeoffs the world is facing, and risks derailing efforts to coordinate on shared goals such as avoiding the destruction of the Earth.

Even when it comes to issues that are of enormous long-term importance, nothing should be packaged with the survival of humanity except the survival of humanity.

* Coherent extrapolated volition is our own stab at answering the question “aligned to whom?” if and when we get to a point where the creators of AIs have some ability to aim them. Coherent extrapolated volition attempts to resolve moral disagreements and meta-moral disagreements mostly by tasking the AI with identifying places where people would converge if they knew more, if they were more the kind of person they wish they were, and so on (in the manner of ideal observer theories in ethics), and searching for shared meta-principles that the AI can fall back on in cases where there’s truly foundational moral disagreement. (Where the goal isn’t necessarily for the AI to “solve all problems” in human life; just solve enough problems that the end result isn’t likely to be catastrophically bad.) We’d recommend extrapolating the volition of all living humans — not because we think this is some sort of ideal, but because it’s the obvious default coordination point around which many disagreeing stakeholders can agree (and because other entities that living humans care about get some sway through those living humans’ volition; and so too for other entities that living humans would care about if they knew more and were more who they wished to be and so on).

But to reiterate: We mostly see this topic as a distraction in the present day. It isn’t important to reach agreement about any of these high-level philosophical ideas, in order to take action on a technology that’s on track to get us all killed. It would be profoundly foolish to let nonproliferation work get derailed by people debating bright ideas like this — including bright ideas that we authors personally like.

We nevertheless mention this proposal briefly, just to make it clear that we aren’t trying to duck the question; and to perhaps reassure readers who worry that it may be impossible to ever come up with a workable proposal. Even if coherent extrapolated volition is the wrong high-level approach for some reason, the fact that it captures a lot of desirable properties should inspire some hope that it is possible to find a non-catastrophic answer to this question.

Your question not answered here?Submit a Question.