Section 1 - Risk identification and evaluation
We take a scientific approach to mapping AI risks through research and expert consultation, codifying these inputs into a risk taxonomy. Our mapping process is fundamentally iterative, evolving alongside the technology, and adapting to the range of contexts in which people use AI models or applications.
We’ve codified our risk-mapping work into a taxonomy of potential risks associated with AI, building on industry guidelines such as the NIST AI Risk Management Framework, informed by our experiences developing and deploying a wide range of AI models and applications. These risks span safety, privacy, and security, as well as transparency and accountability risks such as unclear provenance or lack of explainability. This risk map is designed to enable clarity around which risks are most relevant to understand for a given launch, and what might be needed to mitigate those risks.
Our Frontier Safety Framework is a set of protocols that aims to address severe risks that may arise from powerful capabilities of foundation models. It is intended to complement Google’s existing suite of AI responsibility and safety practices. The Framework is built around capability thresholds called “Critical Capability Levels (CCLs).” In the Framework, we describe two sets of CCLs: misuse CCLs that can indicate heightened risk of severe harm from misuse if not addressed, and deceptive alignment CCLs that can indicate heightened risk of deceptive alignment-related events if not addressed.
After identifying and understanding risks through mapping, we systematically assess our frontier AI models and systems. We evaluate how well our frontier models and applications perform, and how effectively our risk mitigations work, based on benchmarks for safety, privacy, and security. Our approach evolves with developments in the underlying technology, new and emerging risks, and as new measurement techniques emerge, such as AI-assisted evaluations.
We design our applications to promote user feedback on both quality and safety and our teams monitor user feedback via these and other channels closely. We have mature incident management and crisis response capabilities to rapidly mitigate and remediate where needed, and feed this back into our risk identification efforts.
Our Frontier Safety Framework describes a set of evaluations called “early warning evaluations,” with a specific “alert threshold” that flags when a CCL may be reached for a frontier model before the evaluations are run again. In our evaluations, we seek to equip the model with appropriate scaffolding and other augmentations to make it more likely that we are also assessing the capabilities of systems that will likely be produced with the model. We may run early warning evaluations more frequently or adjust the alert threshold of our evaluations if the rate of progress suggests our safety buffer is no longer adequate. Where necessary, early warning evaluations may be supplemented by other evaluations to better understand model capabilities relative to our CCLs. We may use additional external evaluators to test a model for relevant capabilities, if evaluators with relevant expertise are needed to provide an additional signal about a model’s proximity to CCLs.
A core component of our measurement approach to responsible AI is running evaluations for frontier models and applications. These evaluations primarily focus on known risks, in contrast to red-teaming, which focuses on known and unknown risks.
A subset of the mapped risks mentioned in our previous answer is relevant to test at the frontier model level. We evaluate the models for risks such as self-proliferation, offensive cybersecurity, child safety harms, and persuasion.
Multi-layered red-teaming plays a critical role in our approach, with both internal and external teams proactively testing AI systems for weaknesses and identifying emerging risks. Red-teaming exercises, conducted both internally and externally, proactively assess AI systems for weaknesses and areas for improvement. Teams working on these exercises collaborate to promote information sharing and industry alignment in red-teaming standards.
Our AI Red Team combines security and AI expertise to simulate attackers who might target AI systems. Based on threat intelligence from teams like the Google Threat Intelligence Group, the AI Red Team explores and identifies how AI features can cause security issues, recommends improvements, and helps ensure that real-world attackers are detected and thwarted before they cause damage.
Our Content Adversarial Red Team (CART) proactively identifies weaknesses in our AI systems, enabling us to mitigate risks before product launch. Our internal AI tools also assist human expert red teamers and increase the number of attacks they’re able to test for.
Our external red-teaming includes live hacking events such as DEF CON and Escal8, targeted research grants, challenges, and vulnerability rewards programs to complement our internal evaluations.
To enhance our approach, we have developed forms of AI-assisted red-teaming – training AI agents to find potential vulnerabilities in other AI systems, drawing on work from gaming breakthroughs like AlphaGo. For example, we recently shared details of how we used AI-assisted red-teaming to understand how vulnerable our systems may be to indirect prompt injection attacks, and to inform how we mitigate the risk.
Application evaluations are designed to assess the extent to which a given application follows the frameworks and policies that apply to that application. This pre-launch testing generally covers a wide range of risks spanning safety, privacy, and security, and this portfolio of testing results helps inform launch decisions. We also invest in systematic post-launch testing that can take different forms, such as running regression testing for evaluating an application’s ongoing alignment with our frameworks and policies, and cross-product evaluations to identify whether known risks for one application may have manifested in other applications.
We evaluate how well our frontier models and applications perform, and how effectively our risk mitigations work, based on benchmarks for safety, privacy, and security. Our approach evolves with developments in the underlying technology, new and emerging risks, and as new measurement techniques emerge. We evaluate the models for risks such as self-proliferation, offensive cybersecurity, child safety harms, and persuasion.
We conduct ongoing fundamental research into new evaluation methods for different kinds of risks from LLMs - such as FactsGrounding, a comprehensive benchmark for evaluating the ability of LLMs to generate responses that are not only factually accurate with respect to given inputs, but also sufficiently detailed to provide satisfactory answers to user queries.
We have a close relationship with the security research community. To honor all the cutting-edge external contributions that help us keep our users safe, we’ve maintained a Vulnerability Reward Program (mentioned in 1.c and 1.d) for Google-owned and Alphabet (Bet) subsidiary web properties, running continuously since November 2010. We recently updated this program to specifically clarify and encourage reporting of issues in our AI products. We released a 2024 year in review of our Rewards program that confirmed the ongoing value of engaging with the security research community to make Google and its models and products safer.
Yes. We augment our own research by working with external domain experts and trusted testers who can help further our mapping and understanding of risks.
External evaluations, where appropriate, are conducted by independent external groups on our frontier models. The design of these evaluations is independent and results are reported periodically to the internal team and governance groups. Results are used to mitigate risks and improve evaluation approaches internally. For Gemini 1.5, for example, external groups, including domain experts and a government body, designed their own methodology to test topics within a particular domain area. The time dedicated to testing also varies per group, with some groups working full-time on executing testing processes, while others dedicated one to two days per week. Some groups pursue manual red-teaming and report on qualitative findings from their exploration of model behavior, while others develop bespoke automatic testing strategies and produce quantitative reports of their results.
We continue to invest in independent research on AI risks. For example, we co-created an AI Safety Fund, initially funded with $10 million to support independent researchers from around the world (academic institutions, research institutions, startups, etc.). The goal of the fund is to allow researchers to better evaluate and understand frontier systems, and – ultimately – develop new model evaluations and techniques for red-teaming AI models.
As described in our response to question 1.d., we signed a first-of-its-kind agreement with fellow members of the Frontier Model Forum (FMF), designed to facilitate information-sharing about threats, vulnerabilities, and capability advances unique to frontier AI.
We support the wider ecosystem with AI safety practices and standards by participating in working groups within global organizations including MLCommons, the World Economic Forum’s (WEF) AI Governance Alliance, the Coalition for Content Provenance and Authenticity (C2PA), Thorn, Partnership on AI (PAI), Frontier Model Forum, and the U.K. AI Safety Institute, among others. Examples of our contributions include:
- With PAI we jointly launched responsibility frameworks for safe deployment, synthetic media, and data enrichment sourcing, among other guidance for AI risk identification and mitigation, alongside industry peers, academics, governments and civil society organizations.
- We contributed to WEF’s Industries in the Intelligent Age white paper series.
- We have signed voluntary commitments including the Tech Accord to Combat Deceptive Use of AI in 2024 Elections and the Safety by Design Generative AI principles for child safety developed by Thorn and All Tech is Human.
- We are a founding member of MLCommons, an engineering consortium focused on AI benchmarks, including the AILuminate benchmark v1.0. This is the first AI safety benchmark produced with open academic, industry, and civil society input and operated by a neutral non-profit with AI benchmarking experience. AILuminate combines a hazard assessment standard, more than 24,000 prompts, online testing with hidden prompts, a proprietary mixture of expert evaluators, and clear grade-based reporting.
- We contributed to ISO 42001, an international standard that specifies requirements for establishing, implementing, maintaining, and continually improving an Artificial Intelligence Management System (AIMS) within organizations.
- We launched the SAIF Risk Self Assessment, a questionnaire-based tool that generates a checklist to guide AI practitioners responsible for securing AI systems. The tool will immediately provide a report highlighting specific risks such as data poisoning, prompt injection, and model source tampering, tailored to the submittor’s AI systems, as well as suggested mitigations, based on the responses they provided.
- We make public many of our best practices for identifying, assessing, and evaluating risk via the Responsible AI Toolkit, which includes a methodology to build classifiers tailored to a specific policy with limited number of datapoints, as well as existing Google Cloud off-the-shelf classifiers served via API.
We promote industry collaboration on the development of standards and best practices for risk mitigation measures. We work with relevant stakeholders (as described in 1.g.), such as the Frontier Model Forum, Partnership on AI, and ML Commons, across sectors to assess and adopt risk mitigation measures. For example:
- The Frontier Model Forum releases regular publications that reference and support Google research, such as their recent publication on Thresholds for Frontier AI Safety Frameworks.
- In 2024 we launched the Coalition for Secure AI (CoSAI) with industry partners. This is the first major milestone and application of Google’s Secure AI Framework (SAIF). The coalition will collectively invest in AI security research, share security expertise and best practices, and build technical open source solutions. CoSAI is an open source initiative designed to give all practitioners and developers the guidance and tools they need to create AI systems that are Secure-by-Design. The coalition will operate under the guidance of OASIS Open, the international standards and open source consortium. Founding members include: Amazon, Anthropic, Cisco, Cohere, GenLab, Google, IBM, Intel, Microsoft, Nvidia, Open AI, Paypal and Wiz.
With respect to question 1.d (Does your organization use incident reports, including reports shared by other organizations, to help identify risks?), as referenced in 1.b. - we design our applications to promote user feedback on both quality and safety, through user interfaces that encourage users to provide thumbs up/down and give qualitative feedback where appropriate. Our teams monitor user feedback via these channels, as well as feedback delivered through other channels. We have mature incident management and crisis response capabilities to rapidly mitigate and remediate where needed, and feed this back into our risk identification efforts. Importantly, teams are enabled to have rapid-remediation mechanisms in place to block content flagged as illegal.
We updated our Vulnerability Reward Program to specifically clarify and encourage reporting of issues in our AI products.
Through our Frontier Model Forum membership, along with other member firms, we have signed a first-of-its-kind agreement designed to facilitate information-sharing about threats, vulnerabilities, and capability advances unique to frontier AI. Our Detection & Response team provides 24/7/365 monitoring of Google products, services and infrastructure – with a dedicated team for insider threat and abuse.


























