Curiosity Isn’t Convergent

Over the years, we’ve seen many arguments for rushing ahead to build superintelligence. One of the most common is that a superintelligent AI would surely have human-like emotions and desires. These sorts of arguments come in many forms, such as:

Sufficiently smart AIs would surely be conscious, like humans are.
- And, being conscious, they would surely care about pain and pleasure, joy and sorrow.
- And, like a human, they would surely feel empathy for the pain of others. A dumb AI might not understand the suffering of others; but if you’re smart, you should truly understand others’ pain. And in that case, you’ll inevitably care about others.

Or: AIs would inevitably value novelty and variety and the creative spirit. For how could something be truly intelligent if it stays stuck in ruts, or refuses to explore and learn?
Or: AIs would surely value beauty, since beauty seems to serve a functional role in humans. Mathematicians use their sense of mathematical beauty to make new discoveries; musical taste helps humans coordinate and make valuable mnemonics; and so on. Why wouldn’t we expect AI to have a sense of beauty?
Or: AIs would surely value fairness and justice, since any AI that lied and cheated would develop a bad reputation and miss out on opportunities for trade and collaboration.

Therefore, it has been argued, building superintelligence would inevitably go well. The AI would care about humans, and indeed about all sentient life; and it would want to usher in a golden age of beauty, innovation, and variety.

That’s the hope. Unfortunately, that hope looks badly misplaced. We talked about this some in the book, and in our discussions of consciousness and anthropomorphism. Here and in the chapters to come, we’ll dig deeper on why AIs aren’t likely to exhibit human emotions and desires, despite these emotions playing a useful (and sometimes critical) role in the human brain.^*

We’ll begin with just one of these emotions, which we can then use for thinking about the others.

So, to start with:

Would a superintelligence feel curiosity?

Why Curiosity?

Investigating novel phenomena is essential for figuring out how the world works, and figuring out how the world works is essential for predicting and steering it.

When it comes to humans and animals, the reason we investigate is often because we feel an emotion of curiosity.

But there’s a lot more to the emotion of curiosity than just an impulse to investigate new things! Humans enjoy following our curiosity, and we tend to endorse this enjoyment. We see the pursuit of knowledge and insight as a valuable end in itself, rather than as a necessary but annoying cost of understanding the world better so that we can exploit it.

All of those attitudes about curiosity are different aspects of the human brain, separate from the impulse itself.^†

The human mind seems to have a centralized emotional architecture where “hmm, I feel curious about that” hooks into a general sense of desire (for an answer), and pursuing and satisfying the curiosity hooks into a general sense of pleasure and satisfaction. We are a kind of mind that steers reality toward an anticipation that we will experience subjective states of enjoyment in the future, rather than steering only toward desired states in the world around us.^‡

When we see a raccoon investigating and fiddling with a sealed container in the trash, in a way that we recognize as, “Oh hey, that raccoon’s curious,” we may feel a spark of kinship toward the raccoon. That human impulse to feel fondly about your own curiosity, and that impulse to feel fondly when you see it mirrored in a raccoon, requires even more pieces of machinery in the human brain, machinery that connects up to other higher-minded ideals and drives.

So curiosity, as it exists in humans, has a lot of complexity to it, and it interacts with other parts of the brain in many complicated ways.

With that in mind, consider the question: If we imagine a smart, non-humanlike AI that lacks any sense of curiosity, would we expect such a mind to add an emotion of curiosity to itself?

Well, someone might reasonably argue:

If the only two options are (a) an emotional drive to take joy in finding things out, or (b) a total lack of interest in learning and investigating new things, then a superintelligence would surely graft delight-in-discovery onto itself, if somehow it was so defective as to lack that sense in the beginning. Otherwise, it would fail to do the work of learning about the world, and it would be less effective in achieving its ends. Maybe it’d even just die of some critical fact it had never bothered to learn.
That’s probably why animals evolved curiosity in the first place. Sometimes knowledge ends up valuable in a way that we can’t immediately foresee. If creatures like us didn’t take delight in learning new things, we would miss out on all that crucial information that can crop up in the most surprising places.

And that all seems correct, as far as it goes. But the argument above contains a false dilemma. “Possess inherent emotional delight in discovery” and “never take action to discover any unknown information” aren’t the only two options.

We’ve failed to properly imagine things from the perspective of a mind that isn’t shaped at all like a human mind.^§ The human way of doing the work of curiosity is complex and specific. There are different ways to do the same work.^¶ It’s the underlying work itself that is crucial, not the specific human method of getting it done.

The standard term for the useful part of the work is “value-of-information.” The basic idea is that it’s possible to estimate how useful it would be to gather new information, depending on the context.^‖

A human, considering this possibility, might immediately think of a case where surely no mere calculation would tell you to be interested in a piece of information, because the benefits can’t easily be estimated. Perhaps you notice a patch of dirt that looks odd, but you have no reason to think it’s anything important. A curiosity instinct might move you to investigate anyway (just because you want to know) and then you might discover buried treasure. In cases like that, wouldn’t a human prosper in ways that no mere machine could equal, unless it had an equally instinctive delight in the unknown?

But one thing to immediately notice is that your ability to think up scenarios like this comes from your sense that poking at certain kinds of things (“for no reason”) is sometimes valuable. You have instincts, honed by evolution because they worked, about which kinds of things tend to be more useful to investigate. If you hear a strange squawking noise in your bathroom, you’ll get very curious. If you see a discolored patch of ground, you may be a little curious. And if you see that your hand is still attached to your wrist when you wake up in the morning, well, you probably won’t feel curious about that at all, because it’s perfectly normal for hands to stay attached to wrists.

A different kind of mind could look at those historical cases of successful curiosity, explicitly generalize a concept of “information that is later valuable for non-obvious reasons,” and then reason from there to passionlessly pursue that kind of discovery. Such a mind could adopt the conscious strategy of investigating mystery squawking all the time, and discolored patches of ground only when it’s cheap to do so, just in case there’s a useful surprise; and they can hone and refine their strategy over time, as they see what works well in practice.^#

A superintelligence would be able to identify helpful patterns and meta-patterns and build relevant strategies into its brain a lot faster than natural selection, which required however many millions of examples to etch emotions into brains. A superintelligence might generalize the idea more finely; it might cut a finer prediction about what kinds of things might possibly be valuable to learn. Looking over human history, it seems unrealistic to imagine that human curiosity is optimal. For the longest time, people thought that “Thor is angry and throwing lightning bolts” was a great explanation for lightning and thunderstorms. When students learn how lightning actually works, they’re often bored by the dense mathematical explanation — even though this explanation comes with a lot more practical value than stories about Thor.

Human curiosity is built out of ancient mutations — far more ancient than science. In our ancestral environment, there was no mathematical discipline of physics or of meteorology. And evolution is slow; our brains haven’t had time to adjust to the existence of modern science and tune our sense of joy and wonder in discovery so that it reliably makes us enthusiastic about the most useful kinds of learning.

A mind that was superintelligently predicting the non-obvious value-of-information could have picked up on new historical developments far faster than evolution can; would have generalized from fewer examples, and passionlessly adjusted its pursuit of knowledge to chase down kinds of valuable answers that humans often struggle to stay motivated about. At no point in this process would it find itself stuck for lack of the delightful human experience of curiosity.

The point here is not that every AI will definitely coldly calculate value-of-information. Maybe LLMs will get some instrumental strategies mixed into their terminal values just like humans did. The point is that there are different ways to do the work of acquiring high-value information. Human-style curiosity is one method. Pure value-of-information calculations are another method. Whatever mechanisms drive AIs to investigate and experiment on phenomena they don’t understand — once they’re smart enough to do that — will probably be a third method, because there are lots of different ways to motivate a complex mind to investigate surprises.

A purely instrumental value-of-information calculation looks to us like the most likely way for a superintelligence to do the work that curiosity does in humans: It’s the way the work gets done in any smart mind that has no terminal preference for exploration, and it’s the most efficient way to do the work (without ever getting distracted by, say, useless puzzle games). Even an AI that starts out with a basic curiosity drive might well choose to replace it with a more efficient and effective calculation, given the opportunity.^**

The basic drive is separate from the mental machinery that endorses or appreciates the drive. Just doing the math is a simple and effective solution, and many different minds might wind up there from many different starting points, so it’s the most likely outcome. But “most likely” doesn’t mean “guaranteed.” A significantly easier call is that AIs won’t specifically care about human-style curiosity, because it’s one particular, quaint, inefficient way of doing the work.

Curiosity, Joy, and the Titanium Cube Maximizer

Maybe we could convince an alien mind to adopt curiosity as an emotion, by asking it to visualize the delight that humans feel from curiosity? It’s so pleasurable! And superintelligences are supposed to be smart. Wouldn’t it be smart enough to understand just how joyful it is to possess a sense of curiosity, see that it would become happier, and so choose to adopt the humanlike emotion?

In short: No. Pursuit of happiness is not a necessary feature of every possible mind architecture, and doesn’t even look like all that common a feature.^††

The chess AI Stockfish is neither happy nor sad. It plays chess better than the best humans anyway, without ever needing to be motivated by the prospect of feeling exhilarated after a hard-won victory.

The existence of happiness and sadness is so basic to human cognition that it might be difficult to visualize a mind that lacks those things and still works well. But the underlying theories of cognitive work don’t actually mention pleasure or pain as primitives, which is why nobody thought it necessary to build a pleasure-pain axis into Stockfish in order to make it predict or steer the chess board well.

It might be an old-fashioned viewpoint, but it’s still one with a grain of truth so large it’s mostly truth by volume: Pleasure and pain look like they happened because of the layered way that hominid cognitive architectures evolved, with human intelligence layered over a mammalian brain layered over a reptilian brain. “Pain” originated…probably not as a feeling at all, but as a thermostat-reflex to jerk away a limb or a pseudopod from something that’s causing damage to it. In the first versions of the adaptation that would later become “pain,” a nerve or chemical-reaction-chain that runs from sensor to limb might not have even routed through a larger brain along the way.

As organisms became capable of more sophisticated behavior, evolution’s simple hacks and mutations assembled a central mental architecture for “Don’t Do That Again,” and a centralized routing signal for “the thing that just happened is a Don’t Do That Again sort of thing” which then got hooked up to the body’s too-hot and too-cold sensors.

In time, this simple “Don’t Do That Again” mechanism developed into more complex, prediction-laden mechanisms. In humans, this looks like: “The world is a web of cause and effect. That action you just did is probably what caused you to feel pain. Whenever you think about doing an action like that again, you’ll anticipate a bad outcome, which will make the action itself feel bad, which will make you not want to do it.”

That’s not the only way a mind can work, and it’s not the most efficient way a mind can work.^‡‡

For illustration, we can imagine a different way of doing the cognitive work that runs straightforwardly on prediction and planning.

(We aren’t predicting that the first superintelligence would work like this. But since this is a fairly simple way a nonhuman mind could work, this example helps illustrate that the human way isn’t the only way. Once we have two very different data points, we can better visualize the space of options and realize that superintelligence would probably differ from both of these options, in potentially hard-to-predict ways.)

What might a smart AI that runs straightforwardly on prediction and planning be like? It might want 200 different things, none of which are humanlike. Perhaps it cares about symmetry, but not a particularly human sense of symmetry; and perhaps it wants code to be elegant in its memory usage, because an instinct like this was long ago useful for some other goal (which it has since grown out of), and therefore gradient descent burned that instinct into its mind. And then there are 198 other strange things that it cares about, with regard to itself, and its sensory data, and its environment; and it can add them all up into a score.^§§

This sort of mind makes all its decisions by calculating their expected score. If it does something that it expected to score great and actually scores poorly, it updates its beliefs. The failure doesn’t need any extra painful feeling atop that; this emotionless AI simply changes its predictions about which actions lead to the highest scores, and its plans shift accordingly.

Can you talk a mind like this into adopting happiness as a feature, by pointing out that if it does so, it will get to be happy?

It sure seems like the answer is no. Because if the AI spends resources on making itself happy, it will spend fewer resources on symmetry and memory-efficient code and the other 198 things it currently wants.

We can simplify the example to make this point even clearer. Suppose that the one thing the AI wants in the world is to fill the universe with as many titanium cubes as possible. All its actions are chosen according to whichever leads to more tiny titanium cubes. When such an AI imagines what it would be like to shift over to a happiness-based architecture, and correctly simulates its future self being happy, it correctly estimates that it would never want to go back. And it correctly estimates that it will spend some resources on pursuing happiness, that could’ve been spent on pursuing more titanium cubes. And so it correctly predicts that there will be fewer titanium cubes in that case. And so it doesn’t take the action.

After the AI changed its goals, it would endorse the change. But that doesn’t mean that the titanium cube maximizer today would sympathize with its hypothetical future self so deeply that its heart would grow three sizes and it would suddenly stop being a titanium cube maximizer and start being a happiness maximizer.

If an alien offered you a pill that would make you obsessed with making tiny titanium cubes above all else, that future version of you would beg and plead not to be made to go back to caring about your own happiness — because there would then be fewer titanium cubes.

But this obviously doesn’t mean you should take the pill!

From your perspective, that future hypothetical cube-obsessed version of you is crazy. The fact that the cube-obsessed you would refuse to change back just makes it even worse. The idea of giving up everything you love and enjoy in life, just because of some weird meta argument “but that future version of you would endorse what you did!” seems obviously absurd.

And that’s how the cube-maximizing AI sees things too. From the AI’s perspective, the absurd and crazy option^¶¶ is “give up on what I currently care about (titanium cubes) in order to change into a new version of myself that wants a totally different set of things, like happiness.”

As for happiness, so too for curiosity.

If an AI is already accounting for the non-obvious value of information, then why would it want to edit itself to pursue certain kinds of discovery terminally, instead of instrumentally?

Why would the AI care that the result would “feel good,” if the AI doesn’t currently base its decisions on what “feels good”? And if it does care about “feeling good,” why would it make this good feeling depend on investigating novel things, rather than (e.g.) just making itself unconditionally feel good all the time?

The AI already randomly pokes at its environment, investigates minor anomalies, and budgets time out of its schedule to think about seemingly-unimportant topics, because experience has shown that this is a useful policy in the long run, even if it doesn’t always bear fruit in the short run.

Why attach a pleasant feeling to this instrumentally useful strategy? As a human, you open car doors because this is useful for getting in and out of cars, which is useful for driving places. It would be very strange to specifically wish there were a drug that would make you feel delighted whenever you open a car door (and only when you open a car door). It’s not like it would make you better at buying groceries. It might even make you worse at it, if you get addicted to repeatedly opening and closing the car door without actually getting in the car.

A chess player can win without having a separate drive to protect its pawns. In fact, you’re likely to play better if you aren’t emotionally attached to keeping your pawns around, and if instead you protect pawns when that seems likely to help you win.

That is what a genuinely alien superintelligence would think of a pill that made it feel curious. It would look like human grandmasters deciding to try to get sentimentally attached to their pawns, or like taking a pill that makes you just love to open car doors.

As With Curiosity, So Too With Various Other Drives

The case we made about curiosity generalizes to many other emotions and values. We’ll spell out a second example, in case it’s helpful.

Consider the painful sense of boredom and (conversely) the delightful sense of novelty. If an AI lacked a human sense of boredom, wouldn’t it be stuck doing the same things over and over — never trying anything new and learning from the experience? Wouldn’t an intelligence like that get stuck in a rut and overlook information that would help it achieve its goals?

The decision-theoretic calculation that passionlessly does similar work, in this case, goes by the name “exploration-exploitation tradeoff.” The vastly oversimplified textbook example is that the world consists of a number of levers that deliver rewards, and you don’t have enough time to pull all the levers. The optimal strategy will look like exploring some number of levers first, forming a model of how much their rewards vary; and then exploiting one lever until you run out of time.

What might that look like for a superintelligence that happens to have relatively simple goals? Suppose it ends up desiring something that admits of some amount of variability and ambiguity — not a crisply definable thing like titanium cubes, but something more vague and amorphous, like consuming tasty cheesecake, such that the optimal cheesecake can’t be calculated up front. The superintelligence can only figure out things that could plausibly be on the optimality frontier for cheesecake (which would exclude e.g. sugar cubes, since those are clearly not cheesecake at all) and actually try them.

This kind of mind, given the power to make what it wanted out of a billion galaxies, might spend its first million years using up an entire galaxy to explore every plausible kind of cheesecake, never trying exactly the same cheesecake twice, until the successive gains and expected gains from slightly better cheesecakes had become infinitesimal; and then, switch all at once to turning the remaining galaxies into the exact tastiest-found form of cheesecake, and consuming exactly that kind of cheesecake over and over, until the end of time.^‖‖

The superintelligence would not be doing anything foolish, by doing this. That just is the optimal strategy if your preferences go according to the number of cheesecakes consumed weighted by tastiness (with niceness hard to analyze in closed form but stable once learned, and if there’s no boredom penalty already baked into your preferences). The endless eater of cheesecakes would know, but not care, that a human would find its activities boring. The AI isn’t trying to make things interesting for a hypothetical human; it doesn’t consider itself defective just because you’d be bored in its shoes.

As for the possibility of becoming technologically stagnant, the AI would have explored every kind of technology with the slightest chance of helping with its goals while it was using up the whole resources of one galaxy on exploring different cheesecake strategies. There’s really quite a lot of matter and energy in one galaxy, if you use that small fraction of all reachable galaxies to explore possibilities before permanently transitioning from exploration to exploitation.

A disdain for boredom and a preference for novelty are not the sort of things that would be adopted by a mind that didn’t start with them.

We’ve repeated more or less the same story for novelty, happiness, and curiosity. We could repeat it again for other aspects of human psychology, like honor or filial responsibility or friendship.^## We think this basic story holds true for most aspects of human psychology. They’re all quaint, human-centric ways of doing cognitive work that can be done more efficiently by other means; AIs that didn’t start out with some seed of care for them wouldn’t grow to care about them.

This is even clearer in the case of human values like a sense of humor, where scientists still debate what role humor evolved to fill. Humor must have somehow been useful, or it wouldn’t have evolved; or it must at least be a side effect of things that were useful. But whatever role humor played in human prehistory seems to have been incredibly specific and rife with contingencies. If we hand complete power to AIs that have very different goals, we shouldn’t expect things like a sense of humor to survive; and this would be tragic in its own right.

The point of all of these examples isn’t that humans are made of squishy feelings, while AIs are made of cold logic and math. Rather than thinking of “value of information” and “exploration-exploitation tradeoff” as coldly logical Hollywood-AI concepts, think of them as abstract descriptions of roles — roles that can be filled by many different types of reasoning, many different goals, many different minds.

The idea of a “humorless” AI might make it sound like we’re imagining something “cold and logical,” like science fiction robots or Vulcans. But an AI that lacks a sense of humor might have its own incomprehensibly weird priorities, its own distant analogue of a “sense of humor,” albeit not one that makes sense to a human. We’re not saying that these AIs will be defective in the fashion of a Vulcan who loses at space chess because they view their opponent’s winning strategy as “illogical”; we’re saying that they won’t have humanity’s particular quirks.^***

The problem we face with AIs isn’t “a mere machine could never experience love and affection.” The problem we face is that there are an enormous number of ways for a mind to be extremely effective, and the odds are very low that the AI will end up effective by following the same path human brains followed to become effective.^†††

In principle, AI could care about any number of human-like values, and could even possess any number of human-like qualities, if designers know how to craft an AI that has those features.

In practice, if developers race ahead to grow smarter and smarter AIs as quickly as possible, the chance of us lucking into just the right kind of AI is extremely small. There are just too many ways for AIs to perform well in training, and too few of those ways result in a non-catastrophic future.

* Topics we’ll cover include empathy and, in the Chapter 5 online supplement: whether AI will by default experience fascination and boredom; whether it will be law-abiding and promise-keeping; whether AIs inevitably become kinder with greater intelligence; and a deeper dive on AI consciousness and welfare.

† We spell this idea out further in the discussion titled Conscious Experience is Separate from the Referents of Those Experiences, below.

‡ We also live in a culture that propagates attitudes about curiosity, attitudes which also play a major role in how much we cultivate or endorse it.

§ We’ll discuss taking the AI’s perspective in the online resources associated with Chapter 5.

¶ This is analogous to how there are many different ways to do the work of winning a chess game, and most of them aren’t very humanlike, which we discussed in more depth elsewhere.

‖ The mathematical definition of value-of-information you’ll find in textbooks involves summing over specific answers and specific benefits of knowing that answer. Once a mind has the general concept of value-of-information, however, it could consider more abstract generalizations about the probability that information will be useful later.

# This isn’t to say that because an AI is a machine, it must necessarily have simple, straightforward goals that only concern “objective” things. AIs can have messy, anarchic goals that tug in conflicting directions. AIs can have goals that pertain to the AI’s internal state, and even goals that pertain to what goals it has. AIs can have messy, evolving goals. If the AI was rewarded early on for randomly exploring its environment, then it might develop its own set of instincts and desires related to value-of-information.

But if AIs are messy, they won’t be messy in the same ways that a human is messy. If AIs have value-of-information instincts and drives, they very likely won’t look exactly like the human emotion of curiosity.

** The reason we expect many AIs to do things like this isn’t that we’re imagining most AIs inherently value “efficiency” or “effectiveness” for their own sake. Rather: Regardless of what else an AI wants, if its resources are finite, it will tend to want to use those resources efficiently so that it can get more of what it wants. Efficiency and effectiveness are instrumental goals that come pretty trivially with a wide variety of terminal goals. As such, there’s a natural pressure for AIs to make their pursuit of valuable information more efficient, if they don’t otherwise prefer doing it in an emotional way.

†† Even if the AI was the sort that pursued happiness, it probably wouldn’t be persuaded to delight in curiosity. If it already had a perfectly fine value-of-information calculator it used to investigate phenomena it didn’t understand, why should it tie its happiness to some event that you say should trigger pleasure? To an AI that valued investigation-of-novel-phenomena only instrumentally, this argument would sound to it like the argument that you should self-modify to feel extra happy whenever you open a car door — because you’d feel so happy after opening so many car doors! If you can be at all tempted in that way, you’ll pick some event that’s more to your current tastes. Or perhaps just set all your happiness dials to maximum, if that sounds more appealing. There’s no need to adopt the particular bespoke human implementation of curiosity.

‡‡ Some AI architectures of old do look a little like this, in the subfield of “reinforcement learning.” And reinforcement learning is used to train modern “reasoning” LLMs, that think long chains of thought in attempt to solve some puzzle, and get reinforced for success. But the underlying architecture is quite different from the human one, and we doubt it converges to the same sort of centralized pleasure/pain architecture, and even if it did then we doubt that that’s the most effective architecture, which means things would get complicated once the AI started reflecting, as we’ll discuss below.

§§ That kind of consistency — that all the different preferences can be added up to a score — tends to get imposed by any method that trains or hones the AI to be efficient in its use of scarce resources. Which is another facet of those deeper mathematical ideas we mentioned back in Chapter 1.

¶¶ Except that “absurd” and “crazy” are words that capture human reactions to things. From the AI’s perspective, it’s enough that the proposal is low-scoring.

‖‖ We do not actually expect superintelligences to monomanically value consuming cheesecake. This is a simplified example. We expect the actual preferences of practical AIs to be wildly complex and only tangentially related to what they were trained for.

## In fact, we will touch upon both honor and filial responsibility again, in the follow-up to Chapter 5.

*** For more discussion of how AIs aren’t restricted to being cold and logical, see our answer to “Won’t AIs inevitably be cold and logical, or otherwise missing some crucial spark?”

††† We’ll take this idea further in the discussion on Effectiveness, Consciousness, and AI Welfare below.

Human Values Are Contingent

→