Intelligence Isn’t Ineffable

In recent years, the field of AI has made progress not by deepening our understanding of intelligence, but by finding ways to “grow” AIs. Attempts to understand intelligence itself met with years of dead ends and stagnation. Now that growing powerful AIs has met with success, some people wonder whether the idea of “understanding intelligence” is just a mirage.

Perhaps there are no general principles to understand? Or perhaps the principles are too weird or too complicated for humans to ever comprehend?

At the same time, others feel that there must be something special and mystical about the human mind, something too sacred to ever be reduced to dry equations. And since intelligence isn’t already understood, perhaps true intelligence comes from this ineffable part of the human spirit.

Our own view is rather more boring than that. Intelligence is a natural phenomenon, like any other. And like many phenomena in biology, psychology, and other sciences, we’re still early into our attempts to understand it.

Many of the basic tools and concepts of modern psychology and neuroscience have only existed for a few decades. It may sound humble to say, “Science has its limits, and this is perhaps one of them.” But imagine instead telling someone that you think scientists a million years from now won’t understand much about intelligence beyond what we know in 2025. In those terms, the claim that intelligence is ineffable sounds more arrogant than the alternative.

The main reason we care about this question — “Is intelligence understandable?” — is that it bears on whether humanity could someday build a superintelligence without threatening our survival. We’ll argue, in Chapter 11, that AI today looks more like alchemy than like chemistry. But is it even possible for there to be a “chemistry” of AI?

Given that we don’t have the relevant scientific insights in hand today, it’s not trivial to establish that a “chemistry of AI” is possible! We can only guess at what a mature science of AI would look like. Given how far off we are from this today, it’s likely that many of the concepts we use in AI today would need to be refined or replaced in the course of making intellectual progress.

In spite of this, we do think that intelligence is intelligible in principle. We don’t think this is an especially hard call, even though recent decades of research show that intelligence isn’t easy to understand.

There are four basic reasons why we think this:

Claims of ineffability have an extremely poor track record in the sciences.
Intelligence exhibits structure and regularities.
There’s a lot we don’t yet understand about human intelligence that should be understandable in principle.
There has already been some progress on understanding intelligence.

Claims of Ineffability Have an Extremely Poor Track Record in the Sciences

When humanity doesn’t understand something, it can often seem intimidating and profoundly mysterious. It can be hard to imagine — or hard to appreciate emotionally! — what it would be like to deeply understand that topic in the future.

There was once, among philosophers and scientists, a widespread belief in vitalism — the idea that biological processes could never be reduced to mere chemistry and physics. Life seemed like something special, something incomparably different from mere atoms and molecules, mere gravity and electromagnetism.

The mistake of the vitalists has been a remarkably common one throughout history. People are quick to conclude that things which are mysterious today are inherently mysterious, unknowable even in principle.

If you look up at the night sky, and all you perceive is a dazzling field of twinkling lights whose nature and laws are unknown…then why believe that you ever could know? Why would that be an aspect of the future that was predictable?

A key lesson of history is that scientific research can handle these deep puzzles. Sometimes the mystery gets solved quickly, and sometimes it takes hundreds of years. But it seems increasingly unlikely that there are any everyday aspects of human life, such as intelligence, that could never even in principle be understood.

Intelligence Exhibits Structure and Regularities

Suppose that you lived thousands of years ago, when even phenomena like “fire” seemed like ineffable mysteries. How could you have guessed that humans might one day understand fire?

One hint is that fire wasn’t a one-off event. It burned in many different places, and in similar ways each time. This reflected a stable, regular, compact thing-going-on hidden underneath “fire,” inside of reality: Different possible arrangements of matter had different bound-up chemical potential energies, and heating up matter let those configurations break up and reform into newer, more tightly bound configurations with lower potential energy, releasing the difference as heat. The fact that you can start a fire more than once suggests that there is some repeating phenomenon behind it to be understood, that “fire” isn’t like “last week’s exact winning lottery numbers” in how much about it exists to be understood or predicted.

Similarly, if you look up at the night sky, you’ll see more than one star. Even the planets, which turn out to be unlike other “stars,” have something in common with stars, in terms of the knowledge needed to understand them.

Our ancestors, who had no experience with successfully understanding fire as a chemical phenomenon, might not have been confident in their ability to understand stars someday. But today we have comprehended fire, stars, and many other phenomena, and we can extract a subtle lesson that goes beyond “Well, we understood that other stuff, so we’ll understand everything else in the future.” It’s the lesson that repetition corresponds to regularity, that things that happen often happen for a reason.

Intelligence exhibits similar regularities that suggest it can be comprehended. For instance, it shows up in every human, and it could be built by evolution’s blind search through genomes. Evidently, similar sets of genes could succeed at multiple different tasks. The genes that let human brains chip handaxes also let us craft spears and bows. And more or less those same genes produced brains that went on to invent agriculture, guns, and nuclear reactors.

If there were no structure, no order, no regularity to intelligence that we might recognize as a pattern, then one animal would have to predict or invent one thing at a time. Bee brains are specialized to hives; they cannot also build dams. It could have been the case that humans needed just as much specialization for every task we can solve; it could have been that we needed to grow specialized “nuclear reactor” brain areas before we could build nuclear reactors. If that is what neuroscientists found inside brains, they’d be licensed to suspect that there were no deep principles of intelligence to comprehend, and that there were different principles for every different task.

But that’s not what we find inside human brains. We find that the same brains designed to chip handaxes are capable of inventing nuclear reactors, which implies that there’s some underlying pattern that the genes were able to take advantage of, again and again and again.

Intelligence isn’t a chaotic, unpredictable one-off phenomenon like last week’s exact winning lottery numbers. There is a regularity of the universe to be understood.

There’s a Lot We Don’t Yet Understand About Human Intelligence That Should Be Understandable in Principle

When it comes to humans, science today can say a great deal about the structure and behavior of individual neurons. And we can say a great deal about ordinary topics in folk psychology, such as, “Bob went to the grocery store alone because he was angry with Alice.” But in between these two levels of description, there’s an enormous amount missing from our understanding.

We know very little about many of the cognitive algorithms the brain uses. We can say very coarse-grained things about functions that correlate with particular brain regions, but we’re nowhere near being able to describe mechanistically what the brain is actually doing.

One simple way to see that there’s a missing level of abstraction is that our high-level neuroscientific models make much worse predictions than one could get by a full simulation of the neurons. Our mechanistic understandings of other people must therefore be incomplete.

Some loss of information is presumably necessary, but a good model would be a lot less lossy. An “understanding” of the differential on a car won’t let you predict everything that the differential does as well as an atomic-level simulation — because sometimes the teeth on the gears will get worn down and slip, for instance. But the gears-level model of a differential still makes some very precise predictions, and it’s easy to see the boundary between the things that the model is supposed to predict (like how the gears will turn when they’re properly interlocked) and what it’s not (like what happens when the gear teeth wear away).

Why expect that this degree of modeling is possible with human minds? Perhaps human minds are too random for that. Perhaps if you want accurate predictions, it’s neurons or bust.^*

Some evidence that it’s not “neurons or bust” is that even your mother can predict your behavior better than the best formal models of brains can. Which means there’s definitely some structure to human psychology that can be knowable implicitly, without exactly simulating someone’s neurons. It just hasn’t been made explicit yet.

More concrete evidence that it’s possible to model human minds better comes from studies of amnesiacs. Some amnesiacs are prone to repeating the same joke verbatim multiple times. This suggests a certain type of regularity in that person’s brain. It suggests that they subconsciously run a particular calculation (based, perhaps, on their circumstance and the presence of the nurse and their memories and history and their desire to spread joy and be seen as clever) that is stable across a variety of minor perturbations.

If there’s that much regularity to a person’s mental calculus, then it seems that it should be possible to understand it — that it should be possible to learn the gears of the decision, to understand the brain in sufficient depth to say:

Ah, these neurons correspond to the desire to spread joy, and those neurons correspond to the desire to be seen as clever, and these neurons here are the ones that generate possible thoughts after seeing the nurse enter the room, and here are the generators that produce the “tell a joke” thought, and here’s how the aforementioned desire-neurons interact with it, such that the thought gets promoted to the fore in the following broader context. And here’s how that context affects the memory access with the following parameters — which, if you follow these pathways here, you can see how that sparks the idea of moving eyes around the room. And given that the room contains a painting of a sailboat, you can see how the “sailboat” concept gets triggered by this cloud of neurons over here, and if you trace the effects back to the memory-lookup, you can see how the patient winds up making a joke about sailboats.

The correct explanation won’t sound exactly like that. But the regularity of the simple macroscopic observable (“same joke every morning”) strongly suggests that it’s not all irreducible randomness — that there’s some reproducible calculation going on in there. (Which, of course, also matches common sense; if brains were purely random, we couldn’t function.)

There Has Already Been Some Progress on Understanding Intelligence

This is the main reason we feel confident that there is a lot left to learn about intelligence. You can read older books like The MIT Encyclopedia of the Cognitive Sciences or Artificial Intelligence: A Modern Approach (2nd Edition) — written before modern “deep learning” techniques (for growing AIs) ate the field of AI — and gain a good deal of insight into how different problems in cognition get solved. Not all of these insights have been fully rewritten to be legible to a lay audience or widely disseminated to university students; much more of it exists than has been popularized.

Take the scientific principle that we should favor simpler hypotheses over more complex ones, all else equal. What, exactly, does “simple” mean here?

“My neighbor is a witch; she did it!” certainly sounds simpler to many people than Maxwell’s equations that govern electricity. In what sense are the equations the “simpler” option?

For that matter, how do we define the idea of evidence “fitting” a hypothesis, or a hypothesis “explaining” the evidence? And how do we trade off the simplicity of hypotheses against their explanatory power? “My neighbor is a witch; she did it!” sounds like it could explain an awful lot of things! Yet many (correctly) intuit that this is a bad explanation. Indeed, the fact that witchcraft can “explain” so many things is part of why it’s bad.

Are there unifying principles for choosing between different hypotheses? Or are there just a hundred different tools to swap out for different problems — and in the latter case, how does the human brain manage to invent tools like that?

Is there a language we could use to describe every hypothesis that computers or brains could ever successfully use?

Questions like these might sound to someone first encountering them like they’re very imponderable and philosophical. However, these are all actually solved and well-understood questions in computer science, probability theory, and information theory, with answers going by names like “Minimum Message Length,” “Solomonoff prior,” or “likelihood ratio.”^†

It also seems relevant that there already exist fully understood AIs that are superhuman in specific domains. We understand all of the relevant principles at work in the chess AI Deep Blue. Because Deep Blue was hand-coded, we can easily inspect different parts of Deep Blue’s code, see everything that a given code snippet is doing, and see how this relates to the rest of the codebase.

When it comes to LLMs like ChatGPT, it’s not entirely clear that there could exist a complete and short description of how they work. LLMs are large enough that they’re allowed to have similar behavior for many different contingent reasons, if (for example) the machinery that makes that behavior happen occurs in a thousand different places inside the LLM. Understanding an LLM’s architecture doesn’t even begin to tell us about the roles played by the different inscrutable weights in the trained model.

ChatGPT could turn out to be hard for scientists to understand, even after decades of study. But the existence of ChatGPT doesn’t mean that intelligence has to be messy in order to work. It just means that it would be an extremely bad idea to try to scale something like ChatGPT all the way to superintelligence, for reasons we’ll cover in later chapters of the book.

The fact that one particular mind is messy doesn’t mean that it’s impossible to understand intelligence. It doesn’t even mean that it’s impossible to understand ChatGPT someday.

If you look very closely at a hundred burning logs, you can see that no two logs burn exactly alike. The fire spreads in different ways, the embers fly off in different directions, and it’s all very chaotic. If you could look very closely and see the log with a fireproof microscope, you could see even more dizzying detail. It seems easy to imagine an ancient philosopher, on observing these chaotic details, concluding that fire would never be fully understood.

And they might even have been right! We may never have the power to look at a log and tell you exactly which fragment of wood will turn into the first ember that floats off to the west. But the ancient philosopher would have been gravely mistaken if they’d concluded that we would never understand what fire is, understand why it happens, create it in controlled conditions, or harness it for great benefit.

The exact pattern of embers is neither very regular nor very reproducible. But on a more abstract level, the yellow-orange-red flickering hot stuff is a regularity occurring again and again in the world, and it’s something humanity managed to comprehend.

The arguments in If Anyone Builds It, Everyone Dies don’t depend much on the technical details that are known about intelligence today. “People keep making smarter computers, and are not in control; and if they make a very smart out-of-control thing, we end up dead” is not that esoteric a concept. But it is useful to know that there is a large body of existing knowledge here, even though there are many remaining mysteries and unknowns in the field.

The book’s core arguments don’t depend on whether intelligence is understandable in principle, which is why we haven’t gone into detail on the existing literature. If no human being could ever possibly understand the mysteries of a superhuman machine intelligence, artificial superintelligence could still kill us.

The question matters mainly when it comes to deciding what to do after stopping the suicide race to AI.

And it matters that intelligence probably can be understood, which means it probably would be possible in principle for smart people to develop a mature field of intelligence, and for those people to figure out a solution to the AI alignment problem.

It also matters that modern humanity is nowhere near close to that feat, of course. But the fact that the feat is possible in principle has implications for how humanity should navigate its way out of this mess, as we’ll discuss later, in an extended discussion after Chapter 10.

Before we get there, we need to explain why AI techniques like the ones we’ve discussed above pose such a grave threat, when and if researchers manage to surpass human intelligence. We begin that account in Chapter 3.

* Heck, maybe even neural simulations are still unreliable, if, say, human behavior is highly sensitive to heat.

† Yudkowsky has written more on these topics in blog posts such as “What is Evidence?”, “How Much Evidence Does It Take?”, and “Occam’s Razor.”

Notes

[1] incomparably different: As the eminent physicist Lord Kelvin put it in 1903: “Modern biologists are coming once more to a firm acceptance of something beyond mere gravitational, chemical, and physical forces; and that unknown thing is a vital principle.” Source: Silvanus Phillips Thompson, The Life of Lord Kelvin (American Mathematical Society, 2005).

Learning to Want

→