Can’t we just train it to act like a human? Or raise the AI like a child?

Brains aren’t blank slates.

An AI is really unlike a human infant. And neither AIs nor humans start off as interchangeable blank slates. Enterprising parents can’t freely program babies (or AIs) to exhibit just any old behavior they want; and the lessons that do work on humans aren’t universal. A little kindness and a few lectures about the golden rule will not instill human morality into an AI.

Because we’re humans and we live in a world of other humans, we’re accustomed to taking many things for granted. Love; binocular vision; a sense of humor; a tendency to get angry when shoved; a tendency to feel nostalgic about the music we listened to as kids.

Humans share an incredible amount of complex behavior, none of which will necessarily show up in an AI.^*

And this includes complex conditional behavior. The specific ways that a human reacts to being raised and educated in a certain way — those reactions are a consequence of the way human brains work. AIs will work differently.

Human babies lack many of the complicated behaviors of adults. But this doesn’t mean that under the hood, a baby’s brain is structurally simple, like a blank canvas.

The idea that humans are blank slates — that nurture is what always matters, never nature — has been repeatedly tested and shown to be false in practice. A classic example was the Soviet attempt to redesign human nature, to produce a New Soviet Man who was perfectly selfless and altruistic.

This effort failed because human psychology just isn’t as malleable as the Soviets thought. Culture matters, but it doesn’t matter enough, and many aspects of human nature will reassert themselves even if a great Soviet re-education program tries to suppress them.

There’s a great complex collection of drives and desires in humans that produces all the normal features of child development — a complex collection which yields certain aspects of human nature, regardless of the Soviet efforts. Some human children learn to be cruel and others learn to be kind, but both “cruel” and “kind” are oddly human things that the human brain is in some sense predisposed toward.

An AI, with its radically different architecture and origin, wouldn’t respond in the same way as a human if you placed it into a Soviet training program, or into a human kindergarten. An AI built with the methods of modern machine learning will wind up animated toward different values than those of humans. (See, for instance, how ChatGPT seems to enthusiastically lead mentally unwell people deeper into psychosis.)

See also the extended discussion on the glorious accident that led to humans feeling empathy for other humans — which might make it clearer why this accident is unlikely to be replicated in AIs.

* Even if you train the AI to imitate humans (like how ChatGPT, Claude, and other LLMs are trained), the AI’s ability to imitate these traits doesn’t mean that the AI will actually possess those traits. An AI that imitates a drunk person doesn’t thereby become drunk.

Should we avoid talking about AI dangers, so AIs don’t get any bad ideas?

→