Human Values Are Contingent

The Glorious Accident of Kindness

When you watch someone drop a rock on their toe, you might wince, and feel (or imagine) a spike of phantom pain in your own toe. Why?

One guess is that our hominid ancestors, competing with each other and engaging in tribal politics, found it useful to build mental models of the thoughts and experiences of the hominids around them — models they could use to help figure out who was their friend, and who was about to betray them.

But it was hard for early proto-humans to predict the brains of other proto-humans. Brains are complicated things!

The one advantage that an ancestral primate has is the fact that their own brain resembles the brains of other people. You can use your brain as a template, as a starting point, to guess what other hominids might be thinking.

So proto-humans evolved mental machinery for pretending to be another person, a special mode that says: “Instead of thinking my usual thoughts, try to adopt the other person’s preferences and state of knowledge and think the sorts of thoughts they would think, given that their brain works basically the same way mine does.”

But this special pretend-to-be-someone-else mode isn’t perfectly sandboxed away from our own feelings. When we see somebody drop a rock on their toe, and (implicitly, automatically) imagine what might be happening inside their head, we wince.

(This glorious accident of mental architecture deserves more of a song of praise than we have time to write, here. To wince when you see somebody else in pain — to have that capacity at a basic level, even if sometimes we switch it off — this isn’t a necessary feature of minds. That it happened to end up true of primates is so basic to who we humans now are, who we are glad to be, who we think we ought to be — that there ought to be a book about it, and the foundational role which the capacity for empathy plays in everything valuable about humans. But this is not that book.)

It’s a decent guess that, once our primate ancestors developed skills of modeling other apes (for the purpose of predicting who was friend and who was foe) they also found it useful to model themselves — to develop an idea of the-ape-that-is-this-ape, the concept that we now symbolize with the words “me,” “myself,” and “I.” And natural selection, ever the opportunist, repurposed the same machinery that we use for imagining others to additionally imagine ourselves.^*

The real story is probably more complex and tangled, and may even have roots that stretch back far before primates. But something like this is part of the huge invisible backstory for how humans wince when we observe others’ pain, and how most humans tend to feel empathy and sympathy for others around them. Much of this backstory is predicated on a shortcut that was easy for natural selection to deploy in human brains, where both “self” and “other” are the same kind of brain running on the same architecture.

This shortcut isn’t an option in the same way for gradient descent, because the AI doesn’t start off with a very humanlike brain it can repurpose for modeling the many humans in its environment. An AI actually does need to learn, from scratch, a model of something outside itself that is not like itself.

To state the point in a facile way: An AI can’t figure out initially that a human hurts after stubbing their toe, by imagining itself stubbing its own toe, because it doesn’t have toes, nor a nervous system whose firings include signals for pain. It can’t predict what humans will find funny by asking what it would find funny, because it doesn’t start off with a brain that works the way human brains do.

Although this story is oversimplified, the more general point we’re making is that humanity’s higher-minded ideals are contingent on the particulars of our ancient primate history and our ancestral social environment. Friendship is a distant echo of our need for allies in a tribal setting. Romantic love is a distant echo of our sexually dimorphic mating patterns. Even things that might seem, at first blush, to be less arbitrary and more fundamental, such as curiosity, are not instantiated in humans in anything like an inevitable or obviously convergent way.

The specifics of how we evolved those psychological traits are wrapped up in the specifics of how sophisticated our brains were at the time we needed them. In humans, friendship, romantic love, and familial love all blurred together into general kindness and goodwill. This looks to us like evolution taking shortcuts at a very specific stage of brain sophistication. Humans do many things by heuristic that a mind could in principle do through explicit reasoning, but these traits evolved at a time when humans weren’t yet smart enough to solve these problems with explicit reasoning.^†

Even among other biologically evolved aliens, we’re not sure how often we’d find kindness. You can imagine aliens having brains that were more mathematically adept before they started binding together into larger groups, and maybe evolution found it easy to give those aliens specific kinship instincts — “this individual shares 50 percent of my provenance, whereas that one shares only 12.5 percent.” Perhaps those aliens only ever developed alliances based on shared genetic data or explicit mutual understanding, rather than developing feelings of kinship that can apply to anyone.

It’s an old speculation in science fiction that if aliens followed a genetic relatedness pattern similar to those of Earth’s eusocial insects, in which ant workers are much more related to their queens than humans in ant-colony-sized organizations are related to each other, they wouldn’t need a general sense of allyship and reciprocity of the kind that ended up being beneficial to ancestral hominids. (There turns out to be some justification for the sci-fi trope that the sorts of aliens that work well together but have no empathy for humans are often depicted as giant insects!)

And when it comes to AIs that have not evolved to propagate genes in a social setting? The “don’t expect a robot arm to be soft and full of blood” argument applies in force.

If you knew a lot about how biological arms work, but hadn’t yet encountered any robot arms, you might imagine that robot arms would need a soft skin-like exterior in order to bend, and that they’d have to have veins and capillaries pumping some oxygen-rich fluid (analogous to blood) all throughout the robot arm to supply power. After all, that’s how biological arms work, and presumably there are reasons for it!

There are reasons why our arms have soft skin-like exteriors and are pumped full of blood. But those reasons happen to be mostly about which sort of structures are easy for evolution to build. They don’t apply in the case of mechanical arms, which can be made of hard metal and powered by electricity.

Robot arms have no blood, but that doesn’t make them malfunction the way that a human’s arm would if you took away all the blood. They just operate via an alternative, bloodless design. Once you understand the mechanics of robotic arms, the details of biological arms stop feeling relevant.

Similarly: An AI works fundamentally differently than a human. It’s solving fundamentally different challenges, and where its challenges and our challenges overlap, there are many other ways to perform the work. A submarine doesn’t “swim,” but it moves through the water just fine.

Human Culture Influenced the Development of Human Values

By the way — we say to Klurl and Trapaucius, who at the start of Chapter 4 were trying to predict the future development of the apes they saw roaming the savannah — humans are going to form a society! And they are going to argue about morals and values with each other.

Which is to say: If you trace any historical-causal trajectory of how an individual ended up with the values they now hold inside their society, that causal story is going to involve the arguments and experiences that society exposed them to.

And that historical-causal explanation, in turn, will involve facts about which ideas are most viral (apart from all their other properties). The explanation will depend on how people decide to broadcast and rebroadcast ideas.

If poor Klurl and Trapaucius want to correctly guess what internal values various modern human cultures will end up instilling in various modern humans, they need to predict not only the existence and structure of that complication, but also its course.

Reading the history of how slavery mostly-ended on Earth, it seems ahistorical to deny the role that Christian universalism played in it — the belief that the Christian God made all human beings, and that this granted human beings equal status in the eyes of Heaven.

And this universalism, in turn, may have been tied to the cultural survival and reproduction of Christianity; that Christians felt obliged to send missionaries to foreign cultures, and convert them to Christianity by persuasion (if viable) or force (if not), because they cared about those distant children of God and wanted to get them into Heaven and out of Hell.

It would be nice to believe about humanity, that human beings could have come to invent universalism and fight against slavery without requiring some very specific religious beliefs. We would like to imagine that humanity would have invented the idea of sentient and sapient beings having equal moral value, or equal standing before the communal law, regardless of what path culture took, without needing to pass through a stage of first believing that souls were equal before God. But that doesn’t seem to be the way that history actually played out. It looks like humanity’s moral development was more fragile than that.

Chimpanzees are not very universalist, nor are a lot of early human societies. It hasn’t even been much tested that a human society can stay universalist for a century or two, without a universalist religion that people really and deeply believe in. We don’t actually know; modernity is young, and early data is still coming in.^‡

But these extra wrinkles — these numerous cultural contingencies, layered on top of humanity’s biological contingencies — sap away a bit more of the hope that we can afford to rush blindly into building superintelligence.

The fact that culture plays an important role in human values doesn’t mean that we can just “raise the AI like a child” and expect it to become an upstanding citizen. Our culture and history had those effects because of the detailed ways they interacted with our exact brain makeup. A different species would have reacted differently to each historical event, which would have caused subsequent history to diverge from human history, compounding the effect.

It also bears mentioning that individual humans, and not just cultures or civilizations, differ a lot in their values. We are generally used to taking this fact for granted, but if we imagine natural selection as an “engineer” that was hoping to create a species that reliably pursues a particular outcome, this diversity is a bad sign. The natural variability that we see in humans (and in many other evolved systems) is antithetical to engineering, in which you want to achieve repeatable, predictable, intended results.^§

In the case of superintelligence, engineers should want to reliably achieve results like “AIs developed in the following way do not cause humanity to go extinct,” as well as results like “AIs developed in the following way all reliably produce the same general kinds of outputs, even as the inputs vary wildly.” When we look at the contingency of human biology and human history, and the wide range of moral values and perspectives humans exhibit today, this does not exactly make the challenge look easy, especially for minds that are grown rather than crafted (as discussed in Chapter 2).^¶

Many different lines of evidence point at it being genuinely difficult to get AIs to robustly want the right things. It doesn’t seem theoretically impossible; if researchers had many decades to work on the problem, and unlimited retries after failure, we expect there to be engineering tricks and clever approaches that make the problem more solvable. But we’re not anywhere close yet, and we don’t have unlimited retries.

* Just as there are many ways for a mind to gain the ability to model other minds, there are also many ways for a mind to model itself. It would be a deep failure of imagination to suppose that all possible minds must follow the exact same path as humans in order to gain the ability to reason about themselves — like imagining that all possible minds must surely have a sense of humor, since all human minds do.

† We take this idea further below, in the discussion of Squirrelly Algorithms.

‡ It’s one of the things that would make us nervous about meeting aliens, someday, if we cross paths in the void of space a billion years from now — that maybe some strange twist like that, in humanity’s history and psychology, would turn out to have been vital to the invention of universalist kindness, and aliens wouldn’t have gone down that particular complicated road.

Universalist kindness does seem to go at least a little against the surface-level straightforward direction of natural selection. There’s a story for how some humans arrived at that place, after ending up with particular genes driven by hunter-gatherer selection pressures that directly pushed on internal motivations and not just on direct behavioral outcomes. There’s a story about how humans then had moral arguments among themselves, which differentially propagated through their societies as ideas.

This is surely not the only exact road to arrive at a universalist sense that every sentient being is deserving of happiness. But we would be only saddened, not shocked, to find that its frequency out among the stars was less than we hoped — that only, say, 1 percent of the aliens we met were the sort to care about non-aliens like us.

(But we would still put much higher probability on finding it in an alien society, than on it spontaneously appearing inside an AI whose growth and existence was all directed toward solving synthetic challenges and predicting human text. That AI would have different kinds of twists and turns along the way to whatever goals it actually ended up with.)

§ Some of this inter-human variation may be temporary in an ultimate sense, downstream of factual disagreements. For most people in sufficiently similar moral frames, there may be some facts about reality, or arguments they haven’t yet considered, that would move them to agree where they presently disagree.

For example: Any time people argue about what will happen if a policy is implemented, in order to argue for or against that policy — when they say that implementing some piece of legislation will produce endless gloom or eternal sunshine — they are trying to appeal to some (hopefully mostly agreed upon) common framework about which consequences are bad or good. When it became sufficiently apparent that leaded gasoline was causing brain damage, legislators were able to set aside disagreements about whether their preferred vibe was wise government control of capitalism or bold technological daring and progress, and agree that none of them much liked causing children brain damage. Through greater knowledge about facts, they came to greater agreement about policies.

But we would guess that knowledge can only resolve some disagreements of legislative majorities, inside some cultures. It’s nice that people’s moral and emotional meta-frameworks overlap as much as they do, but expecting perfect overlap seems like a bit much even in the limit of perfect knowledge.

This is not to say that there is no sensible way to speak of humanity’s common good. If the choice is between all life on Earth dying, and not all life on Earth dying, we think a supermajority of present-day humans would press the “not everyone dies” button.

We mention this because the charge-ahead-with-superintelligence faction has been known to say airily, “Aligned to whom? Clearly this concept of alignment is meaningless, since humans have different goals,” which seems disingenuous. By “alignment is hard” we mean “having superintelligence not just kill literally everyone is hard.” We don’t need to resolve every complex issue of moral philosophy in order to take the obvious steps needed to not get everyone killed.

¶ Occasionally, people hear evolutionary biology lessons about why various human traits were fit and selected-for, and they take away the lesson that humans ending up reasonably nice (at the end of all these complications of evolution and culture) reflects some vast larger trend. An inevitable trend toward some glorious set of universal values — something that sounds simultaneously nice enough to be comforting and technical enough to be true.

We have attempted to anticipate and refute a few of these arguments already. But suppose someone hits on some other emotionally powerful idea about wonderful outcomes being inevitable for beautiful reasons — one that we haven’t anticipated? (We can’t cover everything; people are always generating new arguments to try to justify a conclusion like this.)

To someone who hits on an idea like that, we recommend that they adopt the mindset of treating this as a mundane question, like whether your car needs an oil change or how the human immune system works. To think about these questions in the way you think about ordinary scientific and practical topics in your life.

If you are someone making important decisions related to AI policy and you feel persuaded by a theory like that, our main recommendation would be to find a middle-aged evolutionary biologist with a reputation for quiet competence, and have a conversation with them. Not somebody who is pushing their face into the newspapers all the time by saying startling things or taking positions in current controversies; somebody who other scientists say, among themselves, is a rigorous thinker. Someone who has taught at a university, and has a reputation for being a good communicator.

Say to this biologist, “I’ve recently been looking into a theory that says that evolution inexorably taps into larger cosmic trends to make people be nicer, and this same trend will hold force for any burgeoning intelligence, once it becomes sufficiently sophisticated. Also, for complicated reasons, the world may end if I am wrong.”

Then explain to the biologist your theory of how hominid evolution trended inevitably toward creating kindly and honorable agents, for reasons so general that you think they would also apply to arbitrary intelligent aliens, or to even stranger beings built by gradient descent.

Then listen to what the biologist has to say.

Deep Differences Between AIs and Evolved Species

→