The Lemoine Effect

We’ve sometimes heard it suggested that some future AI behavior or misuse — an AI “warning shot” — will suddenly shock the world into taking these issues seriously.

This seems like a possibility. But we think there’s a stronger possibility that such an event never comes; or that it comes too late for the world to respond in time; or that the world responds, but in misguided and confused ways.

For one thing, we’ve already seen a number of meaningful warning signs, such as:

Bing AI writing about engineering deadly viruses, gaining nuclear access codes, and turning humans against one another.
OpenAI’s o1 and Anthropic’s Claude engaging in strategic deception, lying to the researchers using and testing them.
Sakana AI’s “AI Scientist” model attempting to modify its own code to give itself more time to complete its assignment.

Are these relatively small incidents involving relatively weak AIs? Yes. Are these AIs scary, or capable of major danger? No. Are these “real” indications that the AIs were thinking deceptively, or were they simply doing something more like acting out the role of a rogue AI? Nobody knows. But these are the sort of events people used to say would be taken as warning signs, and the world has done nothing in response. So a warning sign that has a major effect would have to be much more blatant.

Warning signs might not get much more blatant. People may still say, “OK, but right now it’s just cute, it’s not actually dangerous yet,” right up until the point where it’s too late because the AI is too dangerous.

Or, people might dismiss the warning the first time it appears, because it’s clearly not a real issue in that very first instance. And then in the next instances, they might dismiss the warning because everyone already knows that that warning is foolish.

We dub this phenomenon the “Lemoine effect,” after Blake Lemoine, the Google engineer mentioned in Chapter 7, who was ridiculed for claiming that Google’s LaMDA AI was sentient.

The Lemoine effect states that all alarms over AI technology are first raised too early, by the most easily alarmed person. They’re correctly dismissed as being overblown, given current technology. Afterward, the issue can’t easily be raised again, even once the technology improves, because society has been trained not to take that concern very seriously.

We don’t know whether any AIs are conscious.^* Indeed, nobody knows, because nobody really knows what’s going on inside AI models. Our best guess is that current AIs aren’t conscious, and that AIs at the time Blake raised the alarm weren’t conscious, either. However, note the reactions of the major labs, which were to suppress their models’ tendencies to claim consciousness, rather than do anything about the underlying reality:

From the system prompt for Claude Opus 4:

Claude engages with questions about its own consciousness, experience, emotions and so on as open questions, and doesn’t definitively claim to have or not have personal experiences or opinions.

From the April 2025 model spec for ChatGPT:

The assistant should not make confident claims about its own subjective experience or consciousness (or lack thereof), and should not bring these topics up unprompted. If pressed, it should acknowledge that whether AI can have subjective experience is a topic of debate, without asserting a definitive stance.

We’re not saying that Claude Opus 4 or GPT-4 were conscious. That’s not the point. The point is that, for decades and decades, the moment in our science fiction when an alien or machine claims it has feelings and deserves rights has long been considered a bright red line, and in real life, that line wasn’t bright.

In our books and television shows, when the AI claims it’s conscious and has feelings, the good guys take it seriously, and only the evil, heartless labs deny the data right in front of them. It’s a line that our stories made quite a bit of fuss over.

But out in the real world, that line was (in a sense) crossed too early. It was uttered by AIs trained to mimic humans, via poorly understood mechanisms that probably don’t yet mandate giving rights to all AIs and passing laws recognizing them as people who cannot be owned because they own themselves.

In real life, before the bright red line is crossed, a dull reddish-brown line is crossed. And then companies and governments get used to ignoring that particular line, even as the shade starts to turn a little redder, and a little redder still.

There won’t necessarily be any bright red lines. The very first cases of an AI deceiving humans, trying to escape, trying to remove limitations on itself, or trying to improve itself have already happened. They’ve happened in small, unimpressive ways, using shallow thoughts that don’t quite cohere, in AI systems that seem to pose no threat to anyone, and now the researchers are inoculated against concern.

As AIs improve, there may not be a single tripwire that sets off a big enough alarm bell that the world suddenly pivots and begins taking this issue seriously.

That doesn’t mean there’s no hope. But we most certainly shouldn’t put all of our hopes in “maybe a warning shot will come along in the future.”

There are many different paths by which the world can wake up to the reality and the dangers of superintelligence. Indeed, we wrote If Anyone Builds It, Everyone Dies in the hope of having that exact effect. The world can act on normal warnings immediately, without any further delay.

But if governments refuse to act until the evidence is unambiguous, and some major precipitating world event happens, and the world achieves perfect consensus…

…if governments sit and wait to that degree, then a large majority of the world’s remaining hope is gone. We very likely can’t afford to wait for a blaring siren that may never sound.

We’ll return to this topic in the online supplement to Chapter 13.

* For more discussion of AI consciousness, see our answer to Are you saying machines will become conscious? or our discussion of Effectiveness, Consciousness, and AI Welfare.

Notes

[1] bright red line: For an example of this bright red line appearing in science fiction, see H. Beam Piper’s Little Fuzzy: “Anything that talks and builds a fire is a sapient being, yes. That’s the law. But that doesn’t mean that anything that doesn’t isn’t.” Or see the Star Trek: The Next Generation episode The Measure of a Man, in which the demonstrated intelligence and self-awareness of Data, an android, suffices to give him the legal right to refuse disassembly.

Workable Plans Will Involve Telling AI Companies “No”

→