Deadline extended to 1 October: Public consultation on risk thresholds for advanced AI systems
From the OECD Secretariat on behalf of OECD Expert Group on AI Futures Co-Chairs: Francesca Rossi, Stuart Russell , Michael Schönstein.
The OECD is joining forces with a diverse array of stakeholders to explore the potential approaches, opportunities, and limitations to establishing risk thresholds for advanced AI systems (sometimes referred to as Frontier AI or general-purpose AI systems). To kick off this effort, the OECD and collaborating organisations are launching a public consultation to build a foundation of knowledge and perspectives (comments due 10 September).
OECD work on the classification of AI systems explains the need to assess the benefits and risks of different types of AI systems. Policymakers and researchers alike are focusing on developing processes to ensure effective governance and accountability of advanced foundation AI models and systems across their entire lifecycle.
However, determining which systems may constitute a significant or systemic risk and, subsequently, how to mitigate the risk is difficult. Risk thresholds can help. Such thresholds refer to the values establishing concrete decision points and operational limits that trigger a response, action, or escalation (NIST). They can involve technical factors (e.g., error rates, scale) and human values (e.g., social or legal norms) in determining when AI systems present unacceptable risks or risks that demand enhanced scrutiny and mitigation measures.
In recent months, the concept of setting risk thresholds has become a key area of focus in policy and technical communities alike. Such thresholds have been introduced in both voluntary commitments and concrete policy instruments, and they are often based on the capabilities of certain models or systems or, as a proxy measure, the level of computational power (“compute”) needed to train an advanced AI model.
For instance:
- Under the auspices of the May 2024 AI Seoul Summit, 27 countries and 16 AI companies committed to setting risk thresholds, associated evaluation criteria, and mitigation approaches.
- Voluntary commitments put forth by the United States and agreed to by 15 leading AI companies pledge public reporting on system capabilities and discussion of societal risks, which could be relevant to establishing risk thresholds.
- The European Union AI Act and the US Executive Order on the Safe, Secure, and Trustworthy Development and Use of AI have introduced reporting and oversight requirements for models that are trained on or leverage compute power above certain thresholds (1025 FLOPS and 1026 FLOPS, respectively). A draft bill in California has similar aims.
- Responsible scaling policies by companies such as Anthropic and OpenAI commit to certain actions or abiding by certain rules based on risk levels informed by model capabilities.
- A variety of academics and researchers promote risk thresholds, including compute thresholds, as a useful tool in taking a risk-based approach to managing advanced AI systems (see examples here and here).
Such efforts are well aligned with the work of the OECD Expert Group on AI Futures, which has found the controlled development and deployment of high-risk systems to be one of the most important actions needed to ensure economies and societies can reap the future benefits of advanced AI systems while mitigating the potentially significant downsides.
However, some argue that current risk thresholds, often expressed as a specific level of compute power, are somewhat arbitrary and thus could result in unintended consequences. The text of some policy documents also suggests that such thresholds may serve as temporary proxy measures for AI system capabilities until more specific capability-oriented thresholds can be identified and measured. Voluntary corporate commitments, such as those to set out risk thresholds in the Frontier AI Safety Commitments, are useful. Still, they are not enforceable, and some believe they may not be sufficient to prevent a very negative impact on people, society, and our planet.
While consensus is still developing on an optimal approach to determining risk and ensuing capability thresholds, countries and companies are solidly committed to better understanding and defining risk thresholds and using them to help implement mitigation strategies and safeguards.
To further advance this field of study, the OECD and collaborating organisations are launching a public consultation. This effort is structured in the form of a public discussion oriented around several key questions:
- What publications and/or other resources have you found useful on the topic of AI risk thresholds?
- To what extent to you believe AI risk thresholds based on compute power are appropriate to mitigate risks from advanced AI systems?
- To what extent do you believe that other types of AI risk thresholds (i.e., thresholds not explicitly tied to compute)would be valuable, and what are they?
- What strategies and approaches can governments or companies use to identify and set out specific thresholds and measure real-world systems against those thresholds?
- What requirements should be imposed for systems that exceed any given threshold?
- What else should the OECD and collaborating organisations keep in mind with regards to designing and/or implementing AI risk thresholds?
The results of this consultation will inform further research, analysis, and products to help inform policymakers and industry. Please add your thoughts by the deadline of 1 October 2024.