Losing the Future
If anyone builds superintelligence, everyone dies. And the long-term future shaped by such a superintelligence is not likely to harbor beauty, wonder, or joy; it’s more likely to be an empty place.
We’re worried that joy itself perishes out of the universe. Not the entire universe — cosmic expansion and the speed-of-light limit imply that no disaster on Earth can touch more than a few billion galaxies — but the part of the universe that Earth can reach.
We are worried that the future ten thousand years from now looks like a swath of the night sky, ten thousand light years in radius, where all the stars are enclosed in Dyson shells and having their energy harvested and nobody and nothing is happy about this.
There may not even be anything conscious around, in this scenario. And if there is any consciousness left, it’s likely to be rare. Perhaps there is some very deep form of thinking that requires a reflective setup that in its most efficient form is naturally conscious — but does an AI maximizing the number of tiny titanium cubes, or an AI with a thousand different goals that are all weird and alien, need to be doing that level of thinking with most of the matter and energy that it has? Probably not.
As we described in “Effectiveness, Consciousness, and AI Welfare,” our top guess is that consciousness will turn out to be entirely unnecessary from the standpoint of efficiency — just like Deep Blue wouldn’t get more efficient from being modified to rely on a pleasure/pain axis instead of an expected-probability-of-winning axis. Deep Blue plays chess just fine without consciousness, and our top guess is that superintelligences will be able to optimize the universe just fine without it.
It seems clear that the most efficient decision-making system possible isn’t one that runs on pain and pleasure in particular — that is, the most efficient possible decision system doesn’t rely on reified repeat-that and don’t-repeat-that signals attached to an old policy-reinforcement system, with deliberation and reflection later layered on top. And if superintelligent minds don’t share that structure, we don’t expect them to share any more complex structures (like human-style consciousness) either.
This is, to be clear, just a guess. We do not claim to understand the question “Is the most efficient form of cognitive reflection conscious?” well enough to give any sort of confident answer.
But past experience with how analyses like that have gone makes us worry. Getting better at figuring out how cognition works has almost always looked like seeing more and more ways to take cognition apart and put it together again in new ways, not like learning that some cognitive function can only possibly work in exactly the way it does.
In the ancient days of the 2010s (or even more so the 2000s), there were many fans of AI who insisted that the only possible and realistic way to build AI was to scan an entire human mind neuron by neuron into a computer and duplicate all the processes digitally; since, they said, that was the only kind of cognition proven to work. They expected AI that would be exactly like a human; they were very strident about it not being realistic to expect any other way to be possible, let alone that human engineers would ever figure it out.
This sounded silly at the time, and today it sounds even sillier, because exactly duplicating every neuron of a human mind turned out not to be the shortest and fastest way to get increasingly general AI.
The same pattern holds true for more general features of a human mind, like the way humans do value-of-information calculations by instinct and by emotion. The human way isn’t the only way, and when you see the work it’s doing you see that the human brain is not at the optimum of all possible ways to perform that function, if all you wanted was that function. No more than our neurons are the fastest possible computers, or our blood carries the most oxygen that any blood could carry.
The main reason to expect a specific feature of life or minds to show up in the distant future, is that something actively wants it to be there. That some intellect prefers that option over every other possible option.
Human beings, if we make it that far, would presumably choose a long-term future that includes consciousness, and people that care about other people, and happiness (and joy and wonder and so on). We would probably choose complicated happiness bound to the events of our lives, not a drugged-out stupor. If the universe gets taken over by something that doesn’t positively want the universe to be full of the good kind of happiness — as a terminal preference, not a questionably-efficient way of doing something else — we strongly worry that the universe doesn’t end up happy.
And to the best of our knowledge, there is also no known law governing gradient descent in particular that says that if you grow a powerful prediction and steering system, it is liable to end up as a caring, empathic entity that wants to stay caring, or a happiness-motivated entity that wants to preserve happiness in the universe. We know of no reason gradient descent is even likely to pinpoint the kinds of entities that are conscious and that want there to be lots of consciousness in the future.
If AI doesn’t start off conscious, it would likely have no reason to modify itself to become conscious, nor to build new AIs that are conscious. And if AI does start off conscious, it may modify itself to remove consciousness, if consciousness isn’t actively serving its goals, and if it didn’t wind up terminally valuing the state.
This is not something we predict with surety. Maybe running gradient descent on an LLM-like AI sends it down different channels to acquire something like happiness and something like consciousness, and a preference to have lots of both. And maybe a preference like this survives all the way to superintelligence, and is effective in shaping that superintelligence’s behavior.
If forced to make up a number, we’d guess that there’s significantly less than a 50 percent chance that superintelligence will end up caring for consciousness, and an even lower chance that it cares for conscious experiences that are happy. But it wouldn’t be shocking to us. Pleasure and consciousness are plausibly implicated in oversimplified solutions to universal problems; they’re not weird in the same way as humor; you can imagine them developing, and preferences around them developing, even from gradient descent. Maybe even GPT-7, hacking around to build GPT-8 using weirder methods than just gradient descent, would end up accidentally producing a version of GPT-8 that prizes consciousness and happiness.
But if one of the world’s largest boom industries is putting us in a position of very serious uncertainty about whether any life, awareness, or happiness will ever exist again, then it seems clear that it would take a special kind of insanity to allow that industry to drive all of us off a cliff. This was hopefully clear enough from the fact that AI is on track to literally get us all killed; but if you were at all worried that protecting human life meant selfishly prioritizing today’s minds over the minds of the future, we hope that these arguments help clarify what we’re really facing down.
Even in that optimistic case where AIs converge on valuing happiness, it’s worth remembering that there are many other things humanity cares about beyond consciousness and happiness. If the galaxies ended up tiled over with nigh-endless copies of the smallest possible brain that can experience pleasure, experiencing maximum pleasure, forever, then this would likely be an incomprehensible tragedy, relative to the more complex and diverse and happy future that could have been.* Scenarios where the AIs gain only a fragment of our values (like our preference for happiness but not our preference for full, flourishing lives and our preference against boredom and monotony) are dystopian.
We don’t know what a good future should look like, and we don’t know that we care much whether a billion years from now, humans or our descendants or our creations have two eyes or five eyes. We don’t think the future needs to look like the present; the world should be allowed to change, and grow.
But we think such a future should contain people who care about one another, living full lives. People experiencing more complicated things than just maxed-out pleasure; people who aren’t just doing the same things over and over. We’re uncertain about what a good long-term future might look like, but we aren’t so uncertain that we can’t see a wasteland for what it is.
We’d like the galaxies to be full of entities who care about one another, having fun.
We think that that will be lost to the future, if humanity does not change course.
* One might ask whether AI would avoid these dystopias. “Wouldn’t the AI get bored eventually, and want to do something else?”
These outcomes may seem boring to us, but it’s unlikely that most superintelligences are bored by the same things as humans — indeed, it’s unlikely that they experience “boredom” at all, if they don’t have a certain kind of detailed inheritance from humanity or something like humanity. See also the Chapter 5 extended discussion touching on boredom and delight in novelty.