How could a machine end up with its own priorities?

Solving difficult challenges requires AIs to take more and more initiative.

Imagine an AI tasked with curing Alzheimer’s disease. Can it succeed without a strong tendency to persist when it runs into roadblocks?

Can it succeed without being agentic and strategic about where it spends its attention, about making plans and adapting them to the circumstances?

Can it succeed if it isn’t figuring out what knowledge it needs to acquire, and spinning up its own experiments, and figuring out how to execute them?

Possibly. Perhaps Alzheimer’s is the sort of disease that can be cured with some simple drug discoveries, and maybe the AIs of tomorrow will have better intuitions about medical drugs than humans do.

Or perhaps curing Alzheimer’s quickly would require AIs that are substantially smarter than the smartest human biologists. We don’t know.

What about cancer? That one seems more likely to require the sort of AI that can really deeply understand large swathes of human biology, and do things that are far out of reach for scientists today. Still, it seems hard to be sure.

What about curing aging? That one sure seems like it would require the kinds of AIs that’s dogged, strategic, and goal-oriented enough to actually develop an incredibly deep understanding of biochemistry.

The AI companies will push AIs to become more and more skilled, more and more able to solve big and important problems. The AI industry isn’t going to stop of its own accord.

And that will naturally push the AIs to become more and more driven — an effect that, recall, we’re already starting to see in AIs such as OpenAI’s o1.

Recall the capture-the-flag computer security incident from the chapter, and remember that this resulted not from an AI trained to be a hacker, but from an AI trained to be good at solving generic puzzles. The “driven” behavior comes automatically.

See also the discussion of “pure predictors” in the Chapter 1 online resources.

Being tenacious is helpful even when the target is not quite right.

Prehistoric humans who actively pursued a hot meal, a sharper axe, a popular friend, or an attractive mate were more evolutionarily successful. Compare them to the humans who lazed around looking at the water all day, and you might see why desires and drives evolved their way into the human psyche.

The kinds of humans who wanted a better method for chipping flint handaxes, or wanted to convince their friends that their rival was a bad person, and who continuously steered toward those outcomes, were better at achieving those outcomes. When natural selection “grew” humans, the part where humans wound up with lots of different desires that they doggedly pursue wasn’t a fluke.

The specific way humans desire things was perhaps a fluke. Machines that doggedly pursue objectives won’t necessarily do so because of a human-like feeling of determination, any more than the AI Deep Blue played chess out of a human-like passion for the game. But when it comes to accomplishing hard objectives, doggedly pursuing those objectives looks like an incredibly important ingredient.

For all that humans are pretty goal-oriented creatures, some individual humans lack this sort of tenacity and will laze around or give up at the first sign of adversity. But on a large scale, humanity’s ability to solve big science and engineering problems is driven by tenacious individuals and institutions. We’re quite skeptical that a mind could yield anything like humanity’s macro-level output (and ability to dramatically reshape the world) without having some tenacity within it.

Human wants were evolutionarily useful, even when those wants weren’t for evolutionary fitness. Hypothetically, evolution could have instilled within us a single, overriding drive for descendants, and we could have then pursued hot meals and sharper axes solely for the purpose of having more descendants. But instead, evolution instilled us with a separate desire for hot meals.

Having drives and purposes is useful. It’s so useful that it can be helpful for a task (like “genetic fitness”) even when the desire does not exactly match the task.

Or rather, it can be helpful for a while. It can be helpful until the entities with drives and purposes start getting really smart — at which point their behavior might sharply diverge from the “training” target. Millions of years of evolution, only for humanity to build a rich technological civilization and invent…birth control.

Being grown rather than crafted, AIs are liable to wind up with the wrong targets.

This is the topic of the next chapter: You don’t get what you train for.

Aren’t AIs just tools?

→