Why are you imagining a smart AI doing such stupid, trivial things?
AIs can intelligently pursue different things than a human would.
It’s not that the AI is stupid. It’s that it’s intelligently steering the world to a different place than you would steer it.
Someone can be very good at driving, and yet not want to drive their car to any of the destinations you care about.
To go a little deeper into an example we touched on briefly in the resources for Chapter 4: Imagine an AI that’s trying to make lots of tiny titanium cubes, as many as it can. For simplicity, we can imagine that creating titanium cubes is its only objective.* We’ll call this AI the “cube maximizer.”
We have known a lot of people who cannot shake the impression that we are accusing the cube maximizer of idiocy, of failing to understand that if you can just really know what it is like to feel happy you cannot help but choose that. That it is an objectively mistaken decision, regardless of where you are currently steering the universe, to not steer yourself to be happy.
We think we understand where this intuition is coming from. The cube maximizer sure is taking actions that would be deeply mistaken from a human vantage point! A human engaged in such a useless pursuit could probably, by further reflection and philosophical argumentation, be persuaded that they should be doing something that feels more meaningful — that fills them with more happiness, sparks more joy.
It’s just that the cube maximizer isn’t a human. It isn’t seeking the feeling of “meaning” and doesn’t care about happiness and joy. It really, actually doesn’t, all the way down.
Some people find this idea counterintuitive. If you were to learn everything there is to know about how different mental architectures can work, and unearth the origins of your own intuition, the steps that your own brain is taking when it concludes that the cube maximizer is making a horrible mistake…
We think that if you could see the whole picture, you would come to realize that even the very deepest, most mysterious, ineffable, hard-to-describe sense that happiness is just valuable, all on its own, with no further justification needed, is still, in the end, a fact about how humans see the world, not a fact about arbitrary minds.
The cube maximizer is just steering reality to contain more cubes — not more goodness, not more happiness for itself, not “fulfillment” of a variable and manipulable goal that it could change to be more easily fulfillable. Just cubes, and cubes alone.
It is a cognitive engine that figures out which actions lead to the most cubes, and outputs that course of action; it can fully understand itself, freely modify itself, and still be a kind of thing that only modifies itself in a way that leads to the most cubes.
It is just correct, that a sense of happiness is not a cube. It is just correct, that a sense of fulfillment is not a cube. So those are not directions in which it would steer. It is just correct that modifying itself to run on happiness would not lead to more cubes, and so that is not where it would steer and modify itself.
The cube maximizer has no flaw in its predictive understanding of the world. It is not asking some metamoral or metaethical question whose correct answer is “I should pursue happiness” and computing the wrong answer “I should pursue tiny cubes” instead. It does not operate inside the human framework, even an idealized version of the human framework; it is not wrongly computing “shouldness,” but correctly computing expected-to-lead-to-titanium-cubes-ness.
In saying this, we are not saying that it is stuck in a horrible, complicated trap. It is a reflectively self-consistent engine of general intelligence, and (in a way) one less tangled up in itself than us. It is not blinded from seeing the appeal of happiness; it does not look away from any truth about the world or itself. It just doesn’t find any of those truths compelling it to the same course of action that (some) humans are compelled to.
See also the extended discussion on the orthogonality thesis.
* If this assumption offends you, you can imagine that this AI had all sorts of complicated preferences, for all sorts of experiences and intricate devices. In that case, just suppose that most of those preferences satiated using just a few star’s worth of energy, and now for some weird reason, the way it prefers to spend the rest of the energy and matter from the rest of the stars that it reaches is on making tiny little cubes. Then, setting aside the few stars’ worth of matter that it’s defending from disruption, the AI’s actions are answering the question “What action leads to the most possible tiny little cubes?”, and the rest of the points will follow just fine, with the occasional caveat that you can insert yourself.