Civil society

AI governance through global red lines can help prevent unacceptable risks

Stuart Russell , Charbel-Raphael Segerie , Niki Iliadis , Tereza Zoumpalova

September 22, 2025 — 12 min read

As AI systems become increasingly capable and more deeply integrated into our lives, the risks and harms they pose also increase. Recent examples illustrate the urgency: powerful multimodal systems have fueled large-scale scams and fraud; increasingly human-like AI agents are enabling manipulation and dependency, with particularly severe consequences for children; and models have demonstrated deceptive behaviour and even resisted shutdown or modification.

Without clear and enforceable red lines that prohibit specific unacceptable uses and behaviours of AI systems, the resulting harms could become widespread, irreversible, and destabilising.

An international consensus on this challenge is growing, with leading AI scientists at the International Dialogue on AI Safety (IDAIS) calling for “red lines” and thousands of citizens and experts in the AI Action Summit consultation and a civil society poll prioritising the need for “clear and enforceable red lines for advanced AI.” Even major tech companies and international forums, such as the Seoul AI Safety Summit, recognise the urgency around common thresholds for intolerable risks. The recent Singapore Consensus on Global AI Safety Research Priorities likewise emphasises the importance of “technical ‘red lines’ or ‘risk thresholds’”.

In response, the “Global Call for AI Red Lines” launched on September 22, 2025. The call urges governments to reach an international political agreement on “red lines” for AI by the end of 2026, and was signed by an unprecedented coalition of more than 50+ organisations and over 200+ eminent voices, including former heads of state and ministers, Nobel and Turing Prize winners, AI pioneers, leading scientists, and human rights experts.

As a background to the concept of AI red lines, the Global Red Lines for AI three-part series explores what red lines are and why they’re essential, where they are beginning to take form, and how they could be enforced at the global level.

This complementary blog synthesises the key insights across the series, focusing on two essential building blocks for the operationalisation of AI red lines:

Defining precise, verifiable red lines for AI systems;
Establishing effective mechanisms for compliance and oversight.

How AI red lines would work

What we mean by AI red lines

The three-part explainer series defines red lines in AI governance as specific, non-negotiable prohibitions on certain AI behaviours and uses deemed too dangerous, high-risk, or unethical to permit. These boundaries are intended to protect the survival, security, and liberty of humankind.

Red lines function as a tool for AI governance: they define unacceptable risks and should require proof in advance that AI systems will not cross them. Behavioural red lines are intended to be preventive, requiring rigorous safety engineering, testing, and verification before deployment, rather than simply punishing violations after the fact. Liability, fines, or even criminal penalties may address some harms, but they cannot deter behaviours that lead to irreversible loss of human control.

While red lines are not a complete solution for all AI risks, international cooperation to implement them is critical for ensuring that humanity is protected from potential AI-related harms and retains meaningful control over increasingly powerful AI systems.

Two categories: unacceptable AI uses and unacceptable AI behaviours

Understanding AI red lines requires distinguishing between unacceptable ways humans might use AI systems and unacceptable behaviours AI systems might exhibit as a result of design flaws. Both are vital, but this distinction is crucial for developing precise and testable safeguards.

Red Lines for unacceptable AI uses: These prohibit harmful or unethical applications of AI, limiting deployers and users of AI systems.

Example 1: Prohibiting the use of AI for mass surveillance in public spaces. This means that while the AI system may technically be capable of conducting surveillance, its deployment for that purpose is not allowed.
Example 2: Prohibiting AI-generated Child Sexual Abuse Material (CSAM). This means that while the AI system may be technically capable of generating such inappropriate or sensitive content, its creation and use for this purpose is strictly forbidden.
Example 3: Prohibiting AI impersonation. This means that while an AI system may technically be capable of generating highly realistic voice, video, or text that mimics a person, directing it to impersonate someone is not permitted, as it enables fraud, manipulation, and reputational harm.
Other critical examples (see Part 1 of the explainer series):
- Lethal autonomous weapon systems (LAWS)
- Operating critical infrastructure without human oversight
- Social scoring

Red Lines for unacceptable AI behaviours: These apply directly to AI system design, dictating certain actions that the AI must never engage in. The system must be intrinsically constrained by design, meaning it should never cross these lines even if explicitly instructed to do so.

Example 1: Requiring AI systems, by design, to be unable to conduct public surveillance, even if a user asks them to carry out a legitimate task that could be facilitated by conducting public surveillance. For example, an AI agent must never track the movements of neighbours or monitor public streets for security purposes, even if it is instructed to do so by a user. This means the system must also be robust against “jailbreaking” or unauthorised instructions meant to bypass relevant safety guardrails.
Example 2: Requiring AI systems by design to refuse all requests for biological weaponisation assistance, even if a user tries to instruct it. This means it will never provide an unauthorised user with detailed, actionable assistance that is associated with real-world bioweapons processes.
Example 3: Requiring AI systems, by design, to never impersonate being human. For example, an AI system must not claim to be a real person, suggest that it has a physical presence, or present itself as existing in the user’s environment. This means the system must be intrinsically constrained from generating statements, images, or video streams that mislead users into believing it is a human individual rather than an AI system.
Other critical examples (see Part 1 of the explainer series):
- Power seeking
- Autonomous cyberattacks
- Deception and manipulation
- Self-improvement and self-replication of an AI system
- Uncontrolled self-improvement through unsupervised Chain of Thought (CoT) reasoning

Behavioural red lines are similar to safeguards that avoid nuclear plant meltdowns and plane crashes due to mechanical failures. These limits reflect specific unacceptable behaviours that must be prohibited by design and require developers to demonstrate with high confidence that their systems will comply. They address AI’s potential for unexpected actions and embed core ethical and safety values within the AI’s design, rather than relying solely on human compliance.

AI red lines in practice

AI red lines are no longer a theoretical construct. They are beginning to take shape through legal bans, soft law, and voluntary commitments by states and companies. These early efforts reflect a growing international recognition of the urgent need to establish clear boundaries for the development and deployment of advanced AI systems. (For further analysis, see Part 2 of the explainer post series.)

Red lines in government initiatives

Governments and regional blocs are increasingly embedding explicit prohibitions within their AI governance frameworks. Some examples include:

The EU AI Act and its recent General-Purpose AI Code of Practice establish red lines for AI usages, defining an “unacceptable risk tier” and banning the use of AI for subliminal manipulation, exploiting vulnerabilities, indiscriminate biometric categorisation, public social scoring, and certain real-time remote biometric identification.
In the United States, state-level legislation such as the TRAIGA in Texas, RAISE in New York, and SB243 in California is being introduced to establish specific prohibitions on high-risk AI uses and behaviours. These include manipulating children, behavioural manipulation more broadly, the creation or use of unlawful deepfakes, and infringements of constitutional rights.
Brazil’s AI Bill (not yet enacted) proposes bans on AI for exploitation, social scoring, and autonomous weapons systems, reflecting a proactive stance in the Global South.
While not yet a binding regulation, China’s AI Safety Governance Framework identifies high-risk application categories and recommends avoiding or strictly controlling them, signalling a growing concern at the policy level. Some of China’s rules already essentially act as prohibitions. For instance, providers of generative AI systems must demonstrate that their models will not produce prohibited content on sensitive topics.

Multilateral red lines

International bodies are also moving beyond principles to identify specific areas where AI must be constrained.

The Council of Europe Framework Convention on AI emphasises human rights compatibility and includes a moratorium on AI incompatible with these rights.
The Seoul AI Safety Summit produced a ministerial statement signed by 27 countries and the EU, committing to identifying risk thresholds for frontier AI systems that must not be crossed without effective mitigation. It also led to 20 tech organisations signing the voluntary Frontier AI Safety Commitments.
The G7 Hiroshima AI Process reinforces the need for specific prohibitions, calling on organisations not to develop or deploy AI systems that undermine democratic values, harm individuals, facilitate terrorism, or enable criminal activity.
The UNESCO Recommendation on the Ethics of AI, a globally adopted standard, explicitly calls for prohibiting social scoring and mass surveillance systems.

Three reasons we need international AI red lines

These efforts represent important progress, even if they remain fragmented. Primarily national or regional in scope, they are frequently voluntary and not always supported by clear definitions and robust enforcement mechanisms. Few fully anticipate the emerging risks posed by the most advanced AI systems.

The global community can still build on existing foundations to shape a coherent and enforceable international agreement on AI red lines. Such an agreement could take the form of a framework convention. Efforts to do so will be complex, but three dynamics would make a global standard invaluable:

AI risks are systemic and transnational: AI’s impact transcends borders. Misuse could trigger cascading global failures such as financial crises or pandemics, rendering a patchwork of national rules ineffective.
The current global landscape risks a “race to the bottom”: Competition for AI leadership incentivises speed over safety. Former Google CEO Eric Schmidt emphasised the need for collaboration among global powers on AI issues, stating that there is “a vested interest to keep the world stable, keep the world not at war, to keep things peaceful, to make sure we have human control of these tools”. Countries and companies need global coordination to ensure standards are high. Global red lines provide a crucial minimum baseline.
Success requires public trust: For AI to benefit society, it needs public trust. Binding, enforced red lines would constitute meaningful constraints and a legitimate foundation for AI governance, fostering the trust needed for AI adoption. Such a foundation could prevent disasters that would block further progress for generations to come.

Effective global governance requires a multi-pathway approach, combining “soft law” and “hard law.” This includes:

Legally binding treaties, including framework conventions
International norms and declarations
Bilateral and multilateral agreements

These ideas are explored in more detail in Part 3 of the explainer series.

Two core challenges to operationalising red lines globally

To institute red lines as a practical basis for AI governance, we need clear, technically grounded definitions as well as robust mechanisms for compliance and oversight.

Challenge 1: Precise definitions

For a red line to be effective, it must be defined precisely enough to test whether a system will cross it. Often, this involves two clearly defined boundaries: one where the behaviour is definitely acceptable, and another where it is definitely unacceptable, with a grey area in between. Through analysis and specific case studies, the grey area can be narrowed until the remaining ambiguity is negligible. The responsibility for drawing these boundaries lies with policymakers, regulators, and experts. Once a red line is sufficiently defined, the burden shifts to developers to prove their systems meet safety requirements. If they cannot, they must go back to the drawing board. Without this process, testing and enforcing red lines will be ineffective.

To illustrate the need for clear definitions, consider a behavioural red line against “power-seeking” by an AI system. How do we define an AI system that is “unduly increasing its power and influence” in a way that is specific enough to test and evaluate? Does responding persuasively to a prompt count? What if it optimises for resources (e.g., more compute, faster access to information) to achieve a benign, user-defined goal? Distinguishing benign optimisation from malicious power-seeking is difficult. However, even in this case, work can be done to find sufficiently specific definitions. In this case, it might be along the lines of “taking an action that results in the AI system having a substantially greater capability to achieve goals in future”. Alternatively, a list of special cases could be made to clarify the definition.

In contrast, a red line such as “autonomous self-replication” (an AI system copying or improving itself without human approval) might be more amenable to a clear, technical definition, as it involves identifiable actions and outcomes. The more ambiguous the definition, the harder it is to build verifiable safeguards.

Challenge 2: Compliance and oversight

Even with perfectly defined red lines, a formidable challenge remains: can developers actually prove that their highly complex, often “black-box” AI systems will not cross a given red line? The obligation here lies squarely on the developer. And for many current large language model (LLM)-based systems, the answer, unfortunately, is “probably they cannot.”

There are two reasons for this impasse. First, the internal principles of operation for these “black-box” models are often opaque, making it incredibly difficult to trace why an AI system makes a particular decision or generates a certain output. This opacity makes it challenging to provide quantitative, high-confidence statements of compliance. Second, even if it were possible to analyse and predict LLM outputs in a general way, the analysis would probably show that the LLM-based AI system will violate the red line. The current paradigm of AI development, particularly for LLMs, relies heavily on imitation learning from vast datasets. Accurate imitation of human behaviour does not equate to safety; rather, it can inadvertently lead to undesirable and unsafe behaviours, including self-replication driven by an apparent urge for self-preservation in LLMs.

Ensuring AI safety also requires vigilance regarding the malicious intentions of some humans. The frequent emergence of “jailbreaks” (prompts designed to circumvent an AI’s safety filters and elicit harmful outputs) demonstrates this. These are not random errors; they are often found through intensive, directed search by security researchers, malicious actors, or even by the AI systems themselves exploring their own capabilities. This means that merely testing for statistical error bounds is insufficient. Developers need to ensure robust behaviour for essentially all human inputs, even those designed to trick or subvert the system.

If developers cannot provide a rigorous safety case that their systems will reliably stay within defined red lines, then there needs to be a fundamental shift. Developers will have to rethink their approach, prioritising safety by design. This means investing in new architectures and methodologies that enable verifiable safety as an intrinsic property of the system, not as an afterthought or dependent on fragile safeguards.

Effective global oversight depends on regulators requiring developers to demonstrate that their systems meet agreed safety thresholds, even if the methods to achieve those thresholds are not yet known. This is how safety regulation works in domains like nuclear power, where operators must prove extraordinarily low failure probabilities before operation, even if the prospective nuclear operator does not yet know how to design a plant with such a level of reliability. Setting these safety case requirements in advance is essential for operationalising red lines and protecting against the most dangerous outcomes. The appropriate level of safety depends primarily on what humanity needs to protect itself, and not on the developers’ ability to comply. Conversely, humanity should not have to subject itself to a substantial risk of extinction simply because developers do not fully understand the technology they are creating.

AI red lines as the groundwork for broader cooperation

AI red lines have moved from theory to global necessity, serving as critical safeguards against dangerous AI behaviours and uses. Foundational policies and a growing global consensus are emerging, along with the challenge to scale these efforts to match AI’s rapid pace.

Clearly defining and reaching international agreement on unacceptable limits would be a significant achievement towards building trust, preventing severe harm, and creating the conditions for innovation rooted in safety. Red lines would lay the groundwork for broader international cooperation, including interoperable standards, incident reporting and monitoring, and frameworks for emergency preparedness.

The question is no longer if we need AI red lines, but how quickly we can transform this momentum into concrete, enforceable international standards. This will require sustained and collaborative efforts, including building inclusive coalitions, charting diplomatic paths for international frameworks, and developing blueprints to monitor and verify that AI remains aligned with human values.

Read our three-part series about redlines for AI and join the global campaign

If you are interested in joining the global call for AI red lines, please visit red-lines.ai.

If you are interested in learning more about red lines for AI, read our three-part explainer series:

Part 1: What are red lines for AI, and why are they important?
Part 2: Who is already using AI red lines in practice?
Part 3: How can we create global AI red lines?

Disclaimer: The opinions expressed and arguments employed herein are solely those of the authors and do not necessarily reflect the official views of the OECD, the GPAI or their member countries. The Organisation cannot be held responsible for possible violations of copyright resulting from the posting of any written material on this website/blog.