What if AI is developed only slowly, and it slowly integrates with society?

That scenario still ends in catastrophe.

If AI passes some unknown critical threshold and shoots up in intelligence very quickly, that makes alignment especially hard. But speed is not where the central difficulty comes from. The central difficulty lies in the difference between Before and After.

Suppose that humanity gets several years to “integrate” with AIs that are roughly as general as a human.

(Researchers discussing “slow” progress past this point are generally imagining that we would have at most a few years before AIs become vastly more intelligent and powerful than any human — not decades. But even if a serious contingent of researchers expected decades, it’s not obvious that decades would suffice.)

In those several years, researchers would learn more about how AIs work. Researchers would see various issues arise, and they would find various work-arounds, or ways to mitigate the damage. AI progress would continue; and at some point, the AIs would become collectively capable enough to take power. This follows from the fact that the AIs just keep getting smarter in this scenario.

At that point, researchers will have some pool of tricks and hacks they came up with to make AIs that can’t take over more reliable and profitable. They would need this set of existing tricks and hacks to generalize to AIs that can take over. And they would need this to work on the first try.

At the end of the day, sufficiently capable AIs will be lethally dangerous; and we can only learn from experimental failures that aren’t lethal. This looks like a more hopeless problem if AI progress exhibits threshold effects, as seems likely. But that doesn’t mean that there’s a realistic chance that everything turns out great if we grow enormous numbers of opaque smarter-than-human AIs, but we do it slowly.

One way to tell that there must be a difficult transition point somewhere is to consider a world full of billions of superintelligent AIs that are serving humans despite not actually caring about the humans’ interests. In reality, the world would never get to such a state, assuming we’re using anything like modern machine learning to build the AIs; things would go off the rails far sooner than that. But predicting exactly when and how is a hard call. Considering the extreme case makes it clear that at some point, things would go off the rails.

By analogy: If your city plans to enslave dragons to plow their fields and roast their coffee, some problems get worse if the dragons grow up very quickly. But the core problem is not, "What if the huge fire-breathing monsters that could wipe out our city with one terrible breath grow up quickly?"

If you imagine finding yourself in a city full of unfriendly mature dragons, that situation will go poorly very quickly. It will go poorly even if you thought you had some clever scheme for controlling those dragons, like giving them a legal system which said that the humans own most of the property, such that surely no dragon coalition would dare to suggest an alternate legal system for fear of their own rights being invalidated. It will go poorly even if you plan to cleverly play the dragons off against each other, so that no dragon would dare to breathe fire for fear of the other dragons. When the dragons are fully mature, they will all look at each other and nod and then roast you.

If AIs develop fairly slowly, then that future might get interesting and weird for a time, but it would still end up in a basin where the power and resources go to AIs. Humans are not going to keep pace, realistically. Once any collection of those AIs gets into a position where they could take the planet’s resources for themselves, that’s the point of no return. Either that collection of AIs contains a component that deeply cares about happy, healthy, free people, or the future goes poorly for us.

A Closer Look at Before and After

→