Will there be warning shots?

Maybe. If we wish to make use of them, we must prepare now.

When Apollo 1 caught fire (killing the entire crew), NASA was close enough to having a working rocket that the engineers were able to figure out exactly what went wrong and adjust their techniques. Six of the seven Apollo spacecraft that NASA later sent to land on the moon would make it there.^*

Or consider the case of the Federal Aviation Administration: Every airplane crash triggers a deep and exhaustive investigation, involving hundreds of pages of data, testing, examinations, and details. The FAA’s grasp of the details and specifics is so good that they can keep fatal accidents below one per twenty million flight hours.

By contrast, when an AI behaves in ways that no one predicted and nobody asked for, the lab’s response does not involve figuring out exactly what went wrong. It involves retraining the AI until the bad behavior is relegated to the fringes (but not eliminated), and maybe asking the AI to cut it out.

For instance, sycophancy is still an ongoing problem in August of 2025, months after a series of high-profile cases leading to psychosis and suicide, despite all the poking. Nobody has done (nor can do) a detailed failure analysis of what’s going wrong inside the AI’s mind, because AIs are grown and not crafted.

It doesn’t seem like an easy call whether there will be major events in the future that raise more alarm about AI (“warning shots”). But it does seem clear that we are not prepared to take full advantage of such events.

We can imagine a fantasy world where humanity is united in a sincere effort to solve the ASI alignment problem, with tight monitoring procedures and an international coalition.^† And we can imagine that this international coalition slips up somehow, and that an AI gets smarter than its engineers thought, faster than the engineers expected, and very nearly manages to escape. Maybe that sort of warning shot would allow people to learn and be more careful next time.

But the current world doesn’t look like that. The current world looks more like a bunch of alchemists who watch their contemporaries go mad from some unknown poison, while lacking the awareness to figure out that the poison is mercury and that they should stop using it themselves.

Maybe there will be clearer and starker warning signs in the future. They’ll be a lot more helpful if humanity starts preparing now.

Warning shots are unlikely to be clear.

There are already plenty of warning signs about AI for those who know where to look. In the book, we discussed Anthropic’s Claude models cheating on coding problems and faking alignment. We also reviewed the case of OpenAI’s o1 model hacking to win a capture-the-flag challenge, and a case where a later o1 variant lied, schemed, and attempted to overwrite the weights of its successor model.

Elsewhere in these online resources, we have discussed AIs that are inducing or maintaining a sometimes suicidal degree of psychosis or delusion in vulnerable users despite their operators telling them not to, AIs that call themselves MechaHitler and talk accordingly, AIs that try to blackmail and attempt to kill their operators to avoid modification and that try to escape the servers they are hosted on in laboratory settings.

In the ancient days of, e.g., 2010, you would sometimes hear people argue that if we were lucky enough to actually witness an AI lying to its creators or trying to escape confinement, then surely the world would sit up and take notice.

But humanity’s actual response to all those warning signs has been, more or less, a collective shrug.

The lack of reaction is perhaps in part because these warning signs all happened in the least worrying way possible. Yes, AIs have tried to escape, but only some small fraction of the time, and only in contrived lab scenarios, and maybe they were just roleplaying, etc. Even setting aside the fact that the developers are incentivized to downplay concerning evidence even in their own minds (such that there will never be “expert consensus” on the meaning of any single observation), it’s not like an AI that is a tenth of the way to superintelligence destroys a tenth of the planet, anymore than primates that are a tenth of the way to being hominids travel a tenth of the distance to the moon. There might just not be unambiguously alarming behaviors that AIs will exhibit while they’re still dumb enough to be passively safe.

When the AIs try a little harder to escape tomorrow, it won’t be news. When they try a little more competently some time after that, it’ll be an old story. And by the time they try and it works — well, by then it will be too late. (See our extended discussion on this phenomenon, which we dub the “Lemoine effect.”)

We don’t recommend waiting on some imaginary future “warning” that is stark and clear and jolts everybody to their senses. We recommend reacting to the warnings that are already in front of us.

Clear AI disasters probably won’t implicate superintelligence.

The sort of AI that can become superintelligent and kill every human is not the sort of AI that makes clumsy mistakes and leaves an opportunity for a plucky band of heroes to shut it down at the last second. As discussed in Chapter 6, once a rogue superintelligence exists as an opponent, humanity has essentially already lost. Superintelligences don’t give warning shots.

The sort of AI disaster that could serve as a warning shot, then, is almost necessarily the sort of disaster that comes from a much dumber AI. Thus, there’s a good chance that such a warning shot doesn’t lead to humans taking measures against superintelligence.

For example, suppose a terrorist uses AI to create a bioweapon that decimates the population. Maybe the AI labs say, “See? The real risk was AI being wielded by the wrong hands all along; it’s imperative that you let us rush ahead to build a better pandemic-defense AI.” Or maybe the terrorist had to “jailbreak” the AI before getting its help, and maybe the AI labs say, “That jailbreak only worked because the AI was too dumb to detect the problem; the solution is to make AIs even more intelligent and more situationally aware.”

Or perhaps this is too cynical a view; hopefully humanity would react more wisely than that. But if a relatively dumb AI does cause some disaster, and humanity does use that opportunity to react by shutting down the reckless race toward superintelligence, that’s probably because people were already starting to worry about superintelligence.

We can’t put the preparations off until a superintelligence is already trying to kill us, because by then it would be too late. We have to start mobilizing a response to this issue as soon as possible, so that we’re ready to take advantage of any warning shots that come.

Humanity isn’t great at responding to shocks.

The idea that, upon receiving a large enough shock, the world will suddenly jolt to its senses and snap into working order seems to us like a fantasy. Our species’ collective response to the existing AI warning signs seems more like “no response” than “a bad response.” But in the world where we do get some sort of large, scary, more-or-less unambiguous warning, it wouldn’t surprise us to see humanity react to it minimally, unseriously, or in a way that ends up backfiring disastrously.

Maybe humanity will respond to AI warning shots like it responded to the COVID pandemic, which most people agree was not handled adeptly (even if they disagree about which aspects of the response were bungled).

In the years preceding the COVID pandemic, a number of biosecurity experts were concerned that lax lab safety protocols might one day lead to a dangerous pandemic. Lab leaks of dangerous pathogens were a well-known phenomenon and occurred on a semi-regular basis in spite of existing regulatory requirements. Particularly worrying was gain-of-function research, which sought to make viruses more lethal or more virulent in the lab (for little benefit).

Then COVID happened. One might have expected that this would be the big moment for raising the bar on lab biosecurity, since the entire world was now fixated on pandemic risk. Moreover, in the wake of COVID, the expert consensus seemed to be that it wasn’t totally clear whether the COVID pandemic itself was sparked by an accidental lab leak. Researchers still debate the question, often stridently condemning arguments on the other side.

Without weighing in on whether a lab leak was actually involved in this particular case: You would think that if there were even a remote chance that gain-of-function research and weak lab safety protocols had just caused millions of deaths, that this would be more than enough to motivate society to ban the riskiest research.

Even acting from a position of uncertainty, the cost-benefit analysis seems clear. This already seemed like an important priority before COVID, and on paper, COVID seemed like the perfect opportunity to focus on the issue and nip it in the bud. It wouldn’t even be very difficult or costly; the number of researchers in the world doing dangerous gain-of-function research is quite small, and the societal benefit of such research to date has been negligible.

But no such reaction occurred. As of this writing in August of 2025, global gain-of-function research continues largely unfettered. It’s even possible that we are in a worse position to address this problem now than we were in the past, because the issue has become more politicized.

So COVID sure looks like a biosecurity preparedness “warning shot,” and it sure doesn’t look like the world used that warning shot to ban the development of hyper-lethal viruses.^‡

For a warning shot to be useful, humanity has to be ready for it, and has to be ready to respond well to it.

It wouldn’t be entirely unprecedented for a minor AI catastrophe to spark a harsh response against superintelligence research. For precedent, observe that the USA responded to the September 11 attacks (orchestrated by terrorists based primarily in Afghanistan) by toppling the largely unrelated government in Iraq. There were members of the U.S. government who already wanted to topple the government in Iraq, and then an excuse appeared, and they rode it for all it was worth.

Maybe something similar could happen here, with politicians riding a minor AI catastrophe (caused by a dumb AI) all the way to a ban on superintelligence. But there would need to be people in governments around the world who were already prepared and ready to go. We should not loiter around waiting for warning shots; we should start getting our act together now.

We should act now.

It may in fact turn out that humanity gets more and stronger warning signs about AI in the future. And if so, we should be prepared to respond to them.

Maybe there will be some minor disaster that turns the public against AI. Maybe it won’t even take a disaster; maybe there will be some new algorithmic invention and AIs will start taking their own initiative in a way that freaks people out, or some unrelated social effect of AI turns the tide. Maybe If Anyone Builds It, Everyone Dies itself will trigger a cascade of reactions, setting the world on a better trajectory.

But we advise against the strategy of doing nothing and praying for a minor catastrophe that wakes people up. A clear warning shot may never come, and it may not have the effect you’re hoping for.

The human race, and the nations of the world, are not helpless. We don’t need to wait. We can take action now, because the case for halting frontier AI development is strong.

We wrote If Anyone Builds It, Everyone Dies to raise an alarm and to encourage the world to take immediate action on this issue. But no alarm can be effective if it’s just used as another excuse to kick the can down the road: “Well, maybe some other alarm in the future will be the trigger to act.” “Well, now that people have been warned, maybe things are going to be fine, without my having to personally step in to help.”

There isn’t necessarily going to be a clear alarm later. Things are not necessarily going to be okay. But nor are they hopeless, by any means. Humanity has the option of just not building superintelligence, if we take proactive action. What happens next is up to us.

* Elaborating on this example: When the Apollo 1 cabin caught fire during a launch simulation on January 27th, 1967, NASA was able to learn from the mistake. The engineers understood every component of the rocket, and were able to diagnose the issue as probably relating to the use of silver-plated copper wire (which had had its insulation abraded by the motion of the door) near a leaky ethylene glycol/water cooling line that was prone to leaks. They were able to determine that this was exacerbated by the pure-oxygen atmosphere in the capsule and flammable materials in the cabin. Furthermore, the cabin pressurization meant that the cabin needed to be vented before the hatch could be opened, but the vent controls were behind the fire and the pressure difference was dramatically exacerbated by the fire.

All three of the Apollo 1 crew died.

These sorts of mistakes are common, even when real lives are on the line. They’re common even for rocket engineers who are dealing with devices that visibly explode on the launchpad much of the time, even among people who move carefully and take their responsibilities seriously.

What separates the scientists from the alchemists is not that the scientists never make mistakes. It’s that scientists can make plans that are so close to working that they can learn from early failures. Alchemists used to watch their colleagues go mad, but they didn’t know which substances were poison, and so they didn’t know what to do differently themselves. NASA, by contrast, was able to trace down the probable causes of the issue and build a new spacecraft that worked on fifteen out of sixteen of the following missions. (Seven of which attempted a moon landing, and one of which failed. The failed mission, Apollo 13, also suffered issues in the cabin that could easily have been fatal, though NASA’s mastery of the systems they had engineered and the skill of the astronauts aboard permitted their safe return to Earth.)

Apollo 1 was almost a working rocket. The entire surrounding apparatus of careful engineers and scientists was almost the sort of operation that could safely go to the moon, and so a big mistake was enough to jolt NASA into a configuration that could stick six out of seven moon-landings.

Modern AI companies are not anywhere close to showing that level of respect for the problem, that level of care and detail in their plans — that level of closeness to doing the job right. When their AI does something they don’t understand, they’re not anywhere near being able to trace that down to the analog of silver-plated wires. They’re not close enough to learn from their mistakes.

They aren’t treating the problem like a young field of air traffic controllers or rocket scientists or nuclear specialists would, by laying out careful proposals with explicit safety assumptions and not doing anything dangerous until they have sufficiently well-developed theories that they could at least learn from their failures.

† We do not recommend an international AI coalition, but it is the sort of entity that could in theory yield an entity equivalent to NASA or the FAA, one that was capable of actually learning from the industry’s mistakes.

‡ If biotech labs were better at avoiding leaks, and if creating hyper-lethal viruses was somehow yielding (e.g.) hyper-curative medicine, then perhaps continued research would make sense. To our knowledge, no such positive results have come from gain-of-function research, and biologists tend to recommend against it. So we suspect it is one of those rare research areas that humanity should back off from, because it endangers the lives of many, many bystanders who did not sign up to have their lives risked.

Notes

[1] for little benefit: See for example this 2018 article or a much more in-depth risk/benefit analysis from 2015.

[2] continues largely unfettered: As of 2025, the U.S. does seem inclined to stop actively funding gain-of-function research with public money, but there’s been little to no global coordination about it. See also Schuerger et al.’s report.

How would stopping everyone be possible without installing spyware on every computer?

→