Terminal Goals and Instrumental Goals

Decision theorists make a distinction between two different types of goal, “terminal” and “instrumental.”

A terminal goal is something you care about for its own sake, like fun, or delicious food.

An instrumental goal is something you care about because it helps you get something else you want — like how humanity manufactures plastic not out of any deep love for the art of plastic-making, but because plastic is useful.

If humanity rushes ahead to build a superintelligence, then it seems hard to predict what terminal goals the superintelligence might have. But it does seem like we can predict some of the instrumental goals such an AI would likely have. For example, consider all of the following (unrealistic) goals:

“Calculate as many digits of pi as possible.”
“Fill the universe with as much diamond as possible, using artificial diamonds.”
“Make sure that my reward button stays pushed.”

These are very different goals. But all three goals benefit from at least some of the same instrumental strategies. Filling the world with factories, for example, is useful for building large numbers of computers that can be used to calculate more digits of pi. But building lots of factories is also useful for synthesizing lots of diamonds. And it’s useful for building walls, robots, or weapons to guard your reward button. Factories aren’t useful for every possible goal, but they’re useful for an awful lot of goals.

And in a realistic AI that has grown all sorts of strange goals? Well, at least one of those is likely to benefit from making factories or other large-scale physical infrastructure. Thus, the AI will likely want to build a lot of infrastructure. That’s an easy call, even if the AI’s exact mix of preferences is a hard call.

Similarly, the instrumental goal of keeping yourself alive is useful for many different terminal goals. Staying alive means that you can keep working to make sure that more digits of pi get calculated (or more diamond is made, or more safeguards are built around your reward button).

In slogan form: “You can’t fetch the coffee if you’re dead.” A coffee-fetching robot wouldn’t need to have a self-preservation instinct, and it wouldn’t need to fear death, in order to try to avoid being flattened by a truck on its way to fetch some coffee. It would just need to be smart enough to notice that if it perishes, the coffee won’t get fetched.^*

A key argument made in Chapter 5 of If Anyone Builds It, Everyone Dies is that many different terminal goals imply instrumental goals that would be dangerous to humanity. Thus, even without knowing exactly what a superintelligence would want, we have strong reason to expect it to be very dangerous to humans.

But before we get there, we’ll turn our focus to terminal goals, and the question of how plausible it is that humans and AIs could end up with very similar terminal goals. (In short: not very.)

* This also means that if self-sacrifice is somehow the best way to ensure the coffee gets to its destination, then a robot without a survival instinct might die for the cause more readily than a human would.

If an agent is sufficiently smart and knowledgeable, it can adjust its instrumental strategy to match whatever’s useful in its current environment. In a well-functioning mind, instrumental goals (unlike terminal goals) only stick around so long as they’re useful.

Curiosity Isn’t Convergent

→