Humans tend to get kinder as they get smarter or wiser. Wouldn’t AIs too?

Probably not.

At least some humans (though probably not all) become kinder as they learn more, refine their thinking, reflect on themselves, and grow as people. But, to revisit a theme we’ve seen several times at this point: This looks like a contingent fact about us and about where we’re steering. It doesn’t look like an iron law of computer science.

We can distinguish between an AI’s first-order preferences (“What does it want?”) and its second-order preferences (“What does it want to want?”). Just as an AI’s first-order preferences will point in a weird direction, its second-order preferences will also point in a weird direction. This may be a different direction, such that as the AI gets smarter, it shifts its targets around slightly. But we should still expect it to be a weird direction, rather than looking like a maturing human being.

If somehow humanity managed to build an AI with a single overriding goal (instead of a giant mix of weird and sometimes-competing drives), and that single overriding goal was to build tiny titanium cubes, then as it got smarter, we should expect it to get better at building more tiny titanium cubes.

We shouldn’t expect it to suddenly swap out this goal for things humans value, such as ice cream, friendships, jokes, and justice. That swap would not yield more cubes. If an AI selects its actions according to “Will this get me more titanium cubes?”, it won’t select actions that result in a swap.

The general rule is that as AIs become smarter, they get better at pursuing what they want. See also the extended discussions on orthogonality and self-modification.

Won’t it realize that its goals are boring?

→