Article II: Definitions

For the purposes of this Treaty:

Artificial intelligence (AI) means a computational system that performs tasks requiring cognition, planning, learning, or taking actions in physical, social or cyber domains. This includes systems that perform tasks under varying and unpredictable conditions, or that can learn from experience and improve performance.
Artificial superintelligence (ASI) is operationally defined as any AI with sufficiently superhuman cognitive performance that it could plan and successfully execute the destruction of humanity.
1. For the purposes of this Treaty, AI development which is not explicitly authorized by the ISIA (Article III) and is in violation of the limits described in Article IV shall be assumed to have the aim of creating artificial superintelligence.

Dangerous AI activities are those activities which substantially increase the risk of an artificial superintelligence being created, and are not limited to the final step of developing an ASI but also include precursor steps as laid out in this treaty. The full scope of dangerous AI activities is concretized by Articles IV through IX and may be elaborated and modified through the operation of the Treaty and the activities of the ISIA.
Floating-point operations (FLOP) is the computational measure used to quantify the scale of training and post‑training, based on the number of mathematical operations done. FLOP shall be counted as either the equivalent operations to the half-precision floating-point (FP16) format or the total operations (in the format used), whichever is higher.
Training run means any computational process that optimizes an AI’s parameters (specifications of the propagation of information through a neural network, e.g., weights and biases) using gradient-based or other search/learning methods, including pre-training, fine-tuning, reinforcement learning, large-scale hyperparameter searches that update parameters, and iterative self-play or curriculum training.
Pre-training means the training run by which an AI’s parameters are initially optimized using large-scale datasets to learn generalizable patterns or representations prior to any task- or domain-specific adaptation. It includes supervised, unsupervised, self-supervised, and reinforcement-based optimization when performed before such adaptation.
Post-training means a training run executed after a model’s pre-training. In addition, any training performed on an AI created before this Treaty entered into force is considered post-training.
Advanced computer chips are integrated circuits fabricated on processes at least as advanced as the 28 nanometer process node.
AI chips mean specialized integrated circuits designed primarily for AI computations, including but not limited to training and inference operations for machine learning models [this would need to be defined more precisely in an Annex]. This includes GPUs, TPUs, NPUs, and other AI accelerators. This may also include hardware that was not originally designed for AI uses but can be effectively repurposed. AI chips are a subset of advanced computer chips.
AI hardware means all computer hardware for training and running AIs. This includes AI chips, as well as networking equipment, power supplies, and cooling equipment.
AI chip manufacturing equipment means equipment used to fabricate, test, assemble, or package AI chips, including but not limited to lithography, deposition, etch, metrology, test, and advanced-packaging equipment [a more complete list would need to be defined in an Annex].
H100-equivalent means the unit of computing capacity (FLOP per second) equal to one NVIDIA H100 SXM accelerator, 990 TFLOP/s in FP16, or a Total Processing Performance (TPP) of 15,840, where TPP is calculated as TPP = 2 × non-sparse MacTOPS × (bit length of the multiply input).
Covered chip cluster (CCC) means any set of AI chips or networked cluster with aggregate effective computing capacity greater than 16 H100-equivalents. A networked cluster refers to chips that either are physically co-located, have inter-node aggregate bandwidth — defined as the sum of bandwidth between distinct hosts/chassis — greater than 25 Gbit/s, or are networked to perform workloads together. The aggregate effective computing capacity of 16 H100 chips is 15,840 TFLOP/s, or 253,440 TPP, and is based on the sum of per-chip TPP. Examples of CCCs would include: the GB200 NVL72 server, three eight-way H100 HGX servers residing in the same building, CloudMatrix 384, a pod with 32 TPUv6e chips, every supercomputer.
National Technical Means (NTM) includes satellite, aerial, cyber, signals, imagery (including thermal), and other remote-sensing capabilities employed by Parties for verification consistent with this Treaty.
Chip-use verification means methods that provide insight into what activities are being run on particular computer chips in order to differentiate acceptable and prohibited activities.
Methods used to create frontier models refers to the broad set of methods used in AI development. It includes but is not limited to AI architectures, optimizers, tokenizer methods, data curation, data generation, parallelism strategies, training algorithms (e.g., RL algorithms) and other training methods. This includes post-training but does not include methods that do not change the parameters of a trained model, such as prompting. New methods may be created in the future.

Notes

On Definitions of AI

The definition of AI used here (adapted from Senator Chuck Grassley’s AI Whistleblower Protection Act) is possibly too broad. Further refinement would help make it clear that the definition should not apply to obviously-safe computer systems such as spellcheck or image recognition systems.

If AI technology were never going to change from its modern form, in which development for a frontier Large Language Model requires highly specialized hardware and is easily distinguishable from other activities, it would be easier to craft a narrow tailored definition. But AIS is a moving target, and the definition of AI that is used must cover more than just LLMs. A treaty banning solely machine learning might encourage researchers to develop new AI paradigms that don’t technically meet the definitions, so that they can race ahead toward superintelligence. If a novel paradigm did emerge, especially one which is not as AI-chip-intensive as deep learning, then the treaty would likely need to be updated, and enforcement might become substantially more difficult.

On Definitions of Computing Capacity

We use H100-equivalent as the primary metric for computing capacity. In Article V, this is used to set the size of the largest allowed unmonitored chip cluster (16 H100-equivalents).^* Article IV defines thresholds in terms of the total operations used to train an AI, and so, by setting limits on unmonitored operations per second, this effectively would make it infeasibly slow to conduct an illegally large training run on unmonitored hardware.

We use H100-equivalents because the most relevant metric in various chip designs is how quickly they perform operations, and H100s serve as a fine and precedented measuring stick. Other chip metrics are important in AI training (such as high bandwidth memory), but overall, these matter less than the number of operations per second.

Our proposed definition of a covered chip cluster (CCC) is an attempt to satisfy several constraints: The bound should be high enough to prevent regular people from breaking the rules (i.e., 25 Gbit/s bandwidth between chassis is faster than non-datacenter internet connections; it is very rare and expensive for an individual to own more than 16 H100-equivalents). The bound must also be set low enough to prevent dangerous AI activities and to make subversion difficult (i.e., make it difficult to do training distributed across multiple sub-CCC sets of chips). We discuss the tradeoffs more in the notes after Article V.

AI chips are a subset of advanced computer chips, and there isn’t a bright line that distinguishes AI chips from non-AI chips. Instead of defining and relying on a distinction here, we use the overall computing capacity (in operations per second) of a cluster, as measured in H100-equivalents. If the chips could be configured for training or running AIs and are above the defined threshold, then the treaty requires that they be monitored.

Note that National Technical Means (NTM) may be deprecated as the official term by some governments. We use it in this treaty in the style of past arms control agreements for ease of comparison.

* This is twice the limit mentioned as a clearly-safe limit in the book. It is likely still safe for some time yet, and evaluating where the limits should be (and changing them over time) is the subject of Article III, Article V, and Article XIII.

Article III: ISIA

→