Orthogonality: AIs Can Have (Almost) Any Goal

A Dialogue on Correct Nests, Continued

In Chapter 5, we told the story of the Correct-Nest aliens, who evolved to find it deeply and intuitively “correct” to have a prime number of stones in one’s nest. We might imagine a branch of their conversation that continues as follows:

BOY-BIRD: Let’s go back to the point where you said you’d be surprised to find aliens that have a sense of humor. Surely you aren’t one of those people who believes that the nests we live in are just arbitrary?

GIRL-BIRD: Not at all. “Thirteen is correct, nine is incorrect” is a true answer to a question that we are born to ask by our own natures. An alien that steers toward different things is not disagreeing with us about whether thirteen is correct. It’s like meeting an alien who lacks a sense of humor — the existence of an alien like that doesn’t prove that no jokes are funny! It just helps show that “funny” is something in us.

BOY-BIRD: In us? I don’t know, I like to think of myself as having a pretty good sense of humor. Next you’ll say that all senses of humor are equally good!

GIRL-BIRD: You might well have a better sense of humor than most! But “having a better sense of humor” is also a thing that’s in us. It’s not that there’s a cosmic measuring stick we can use to judge how refined someone’s aesthetic taste is. The measure of humor is happening inside our minds. We’re the ones who contain the measuring stick; we’re the ones who care about it.

BOY-BIRD: So, we’re back to it being arbitrary.

GIRL-BIRD: No! Well, maybe? It sort of depends what you mean by “arbitrary.”

BOY-BIRD: Huh?

GIRL-BIRD: Like, I know you love vanilla bird seed, right? And it’s not as though you can use sheer willpower to find chocolate bird seed tasty instead. So it’s not “arbitrary,” it’s not a thing you can just change on a whim.

BOY-BIRD: Okay, sure…

GIRL-BIRD: There’s not an objective answer outside of you as to whether vanilla or chocolate is tastier, but it’s also not a choice you get to make yourself. It’s just the way you are. Your preferences aren’t up to you, and they also aren’t objectively compelling to every possible mind. If you met an alien, you couldn’t argue the alien into finding vanilla bird seed delicious using sheer logic, and you can’t argue them into having a sense of humor either.

BOY-BIRD: I can try!!

GIRL-BIRD: I’ll be rooting for you. But, okay, maybe a better way of saying it is: There’s some complicated property possessed by good jokes, and our brains compute whether utterances have that property which we call “humor.” And we’re delighted when an utterance has that property. The existence or absence of that property is an objective fact about an utterance (as computed by you, in a given context). An alien could learn to do the calculation. But the part where we find that property delightful is not objective. It’s less like a prediction and more like…well, it’s not exactly a steering destination, but it is a further fact about us, that wouldn’t be true about most aliens, because our humor evolved along some strange twisty evolutionary pathway that doesn’t usually happen. It’s not that the aliens are wrong about which jokes are funny; it’s that their brains just aren’t computing humor in the first place, any more than they are judging their dwellings by whether the number of stones within them are correct. They just don’t care.

BOY-BIRD: Gosh, that’s a depressing view of the universe. Aliens that never laugh, that have nests with completely incorrect stones…surely if the aliens spent enough time thinking about it, they would realize how much they were missing out on? Living in wrong nests, not finding jokes funny, completely disregarding vanilla bird seed. Wouldn’t they eventually figure out a way to correct those flaws and give themselves a sense of humor and everything else they’re missing?

GIRL-BIRD: I could see aliens wanting to change and grow and add new goals, possibly. But why would they pick those exact changes to make?

BOY-BIRD: Because it would be so cheap! By the time those aliens were technologically advanced and freely editing themselves, they’d probably be striding among the stars. It would only take a tiny, tiny fraction of all their resources to put a correct number of stones in their nests! And think of all the amazing joke books they could create, if they just put a tiny fraction of their resources into researching humor! They wouldn’t need to care much at all, compared to how wealthy they’d be. Are they really so monomaniacally obsessed with their top priorities that they can’t spare a tiny bit for this?

GIRL-BIRD: I’m not saying that they’d only care a little bit about correct nests, and that they stubbornly refuse to put any resources into their lower priorities. I’m saying this wouldn’t be a priority for them at all. These particular questions just wouldn’t be inside them. And if they went looking for new properties to add to themselves, they’d add different ones instead, that served their strange purposes even better. They’re not like us. Maybe we could be friends, and maybe we have other things in common. Maybe love, maybe friendship — those seem less complicated and contingent to me. I could see those arising in quite a few evolved species.

BOY-BIRD: Well, if not the aliens, what about the mechanical creature they might accidentally create? Will those listen to reason?

GIRL-BIRD: Hmm. Actually, I fear the situation may be even worse there. Thinking about how different the process of creating an intelligent machine would be from the process of biological evolution, I’m feeling a bit less optimistic that it’d yield love or friendship, in that exotic case.

Good Drivers Can Steer to Different Destinations

Minds of similar intelligence won’t necessarily share similar values. This is an idea that’s known as the orthogonality thesis — the idea that “how smart are you?” and “what do you ultimately want?” are orthogonal (i.e., they vary separately).

The orthogonality thesis says that, in principle, it’s almost never that much harder to pursue a goal for its own sake than to pursue a goal for instrumental reasons. You might learn carpentry because you need to build a table, while your neighbor learns it because they find the activity itself pleasant.

A consequence of this thesis is that not all sufficiently intelligent agents value kindness or truth or love, merely by virtue of being intelligent enough to understand them. It isn’t confused or factually incorrect for the Correct Nest aliens to value prime numbers of stones in their nests. If they got smarter, they wouldn’t suddenly realize that they should care about different stuff instead. Different minds really can just steer to different destinations.

Of course, none of this says anything about how easy or hard it is to create an AI that pursues one objective or another. Any given method for growing AIs will make some preferences easier to instill and other preferences harder to instill.

(Chapter 4 is, in a sense, about how the only kinds of preferences that are disproportionately easy to instill via gradient descent are complex, weird, and unintended ones. So it’s not looking good on that front, either. But that point isn’t related to the orthogonality thesis.)

The point of the orthogonality thesis is to answer the intuition that it would be stupid for a machine superintelligence to pursue things that humans find boring or pointless, and that a smart AI would choose to pursue something else instead. We can call the AI’s goal “arbitrary,” but the AI can call us “arbitrary” right back. Rude words don’t change the practical situation.

The basic argument behind the orthogonality thesis is this: For every mind that can calculate how to produce lots of microscopic cubes made of titanium — that could very efficiently produce lots of little cubes in exchange for large enough payment — there’s some other mind that just has those calculations hooked right into the action system.

Imagine a competent human who really desperately needs to sell lots of titanium cubes to make enough money to feed their family. That person wouldn’t reflect, realize that titanium cubes are boring, and start doing something else instead — not unless that “something else” would also make them enough money to feed their family.

And so a mind that was just taking whatever actions leads to the most cubes would also not decide to reflect, realize that tiny cubes are boring, and start doing something else instead. Its actions are not hooked up to its calculations about what is most “fun” or “meaningful,” in the way that humans care about those things. Its actions are hooked up to its calculations about what leads to the most cubes.

Whatever mental machinery could figure out how to make cubes given sufficient reason, could operate in another mind to just directly steer its actions. Which means that it’s possible for machine intelligences to be animated by pursuit of (say) tiny titanium cubes, with no regard for morality.

An AI like that wouldn’t need to be confused about goodness or morality. Once it got smart enough, it would probably be much better than humans at calculating which action is the most good, or which action is the most moral. It could ace a written exam on ethics. But it would not be animated by those calculations; its actions would not be an answer to the question “which of these options creates the most goodness?” Its actions would be an answer to a different question: “Which of these options creates the most tiny cubes?”^*

A more in-depth discussion of the orthogonality thesis can be found on LessWrong.com. For a discussion of one specific way in which modern AIs are already exhibiting a distinction between understanding and caring, revisit the Chapter 4 extended discussion on AI-Induced psychosis.

* It can make sense to say to a human — who has a whole meta-preference framework going on that you might significantly share — “I think you are valuing the wrong things, here.” Maybe some of those arguments have the power to move you in a way you never thought you could be moved. Maybe it even feels like there’s a moral star outside yourself, that you were always following without knowing it.

All the same, none of that is going to feel compelling to a superintelligent cube maximizer, any more than you could make it laugh if you just found a good enough joke.

It’s not that it doesn’t know what humor is. It can predict exactly what you’ll find funny. It just doesn’t consider that classification an interesting one.

In the same way, it isn’t moved by how you compute what should or shouldn’t be done; nor by which preferences you consider more or less meta-preferable. If something doesn’t care about happiness, nor meta-care about your arguments for why it should care about happiness, then you cannot talk it into adopting a happiness-based decision framework.

Instrumental Convergence

→