Section 1 - Risk identification and evaluation
As described in OpenAI's Preparedness Framework, OpenAI tracks several risk categories and classifies the levels of capability and post-mitigation risk in a scorecard. Models with a “High” capability score require safeguards that sufficiently minimize associated risks before they can be deployed, and models with a “Critical” capability score require safeguards that sufficiently minimize associated risks before they can be developed further.
Additionally, OpenAI’s safety systems teams undertake risk modeling to address a variety of potential risks and harmful content. Our organization classifies AI-related risks using our enterprise risk framework, which incorporates both general risk management principles (aligned with ISO 27001 and NIST) and AI-specific considerations drawn from our internal AI preparedness framework and associated readiness scorecards.
We categorize risks based on their type (e.g., compliance, security, reputational, operational), and evaluate them based on their likelihood and potential impact to users, the public, and OpenAI. These are assessed against defined risk thresholds to determine:
- Acceptable risks (within tolerance),
- Risks requiring mitigation, and
- Unacceptable risks (above tolerance and requiring redesign, escalation, or avoidance).
For AI systems, we pay particular attention to a variety of risks that may include, among other things:
- Model behavior: e.g., potential for misuse, unsafe outputs, or generation of false/misleading content
- Public safety and national security: e.g., ability to assist with cyber or chemical and biological attacks
- System oversight and control: risks arising from inadequate human-in-the-loop processes or unclear escalation paths
- Privacy and data use: e.g., training data sensitivity, inference risks, or regulatory exposure (GDPR, HIPAA, etc.)
- Fairness and bias: including differential performance across user groups or harmful stereotypes
- Transparency and explainability: how understandable the system is to users, developers, and stakeholders
- Deployment context and safeguards: ensuring model outputs are appropriate for the product setting and mitigated for known abuse vectors
- Third-party use and integration: risk of downstream misuse or unclear responsibilities in shared accountability models
We use structured scorecards to assess these categories at various stages of development and deployment, which helps us prioritize mitigation efforts and ensure alignment with both internal expectations and emerging legal standards.
To identify risks across the lifecycle of advanced AI systems, including before deployment and placement on the market, our organization employs a multi-faceted approach:
Prior to deployment, OpenAI conducts a holistic assessment of potential risks that may stem from generative models. We use a combination of methods, spanning all stages of development across pre-training, post-training, product development, and policy. For example, during post-training, we align the model to human preferences; we red-team the resulting models and add product level mitigations such as monitoring and enforcement; and we provide moderation tools and transparency reports to our users. We also conduct evaluations to assess dual-use capabilities, including chemical, biological, cybersecurity, and model autonomy risks, as described in our Preparedness Framework. Evaluation methods include internal testing, red teaming, and external collaborations.
OpenAI continually invests in detecting misuse and emergent risks through deployment of classifiers, rules, content review systems, and manual and automated analysis. We take actions to respond to patterns of abuse or misuse as they emerge, including taking action within the platform to limit their impact, and incorporating lessons learned into model defenses and behavior to improve resiliency over time.
OpenAI employs rigorous red-teaming processes to evaluate the performance of models and systems before deployment. OpenAI uses both human and automated red-teaming processes. Adversarial testing involves experts who attempt to exploit the model in ways that could cause harm or lead to undesirable outcomes, or who probe the model for unexpected or unwanted behavior in a variety of domains. This process is complemented by collaborations with external organizations, such as cybersecurity firms, to provide additional perspectives and expertise in identifying potential threats.
OpenAI’s red teaming efforts leverage an external Red Teaming Network comprising a community of trusted and experienced experts that help inform our risk assessment and mitigation efforts. The Red Teaming Network draws on experts in a wide variety of domains, with diverse perspectives and lived experiences. A detailed description is available here: https://openai.com/index/red-teaming-network/
As an example of how we use red-teaming in practice, before launching GPT-4o we worked with more than 100 external red teamers, speaking a total of 45 different languages, and representing geographic backgrounds of 29 countries. Red teamers had access to various snapshots of the model at different stages of training and safety mitigation maturity for several months before deployment. A detailed description is available in the GPT-4o system card: https://openai.com/index/gpt-4o-system-card/.
Our model review process is also informed by results from testing carried out in collaboration with third party assessors. For example, we’ve worked with the US and UK AI Safety Institutes, and independent third party labs such as METR and Apollo to add an additional layer of validation for key risks. Where possible and relevant, we report on their findings in our systems cards, such as the scheming tests conducted by Apollo and autonomy and AI R&D tasks conducted by METR for o1 (https://openai.com/index/openai-o1-system-card/).
Red team assessments may also be carried out periodically, or due to underlying changes to infrastructure, application code, or in response to threat conditions. Responsible red teaming is permitted and encouraged by good-faith security researchers via our bug bounty program.
We use a combination of quantitative (e.g., frequency of flagged outputs, output quality scores, performance deltas across demographics) and qualitative (e.g., SME judgment, ethical review outcomes, stakeholder feedback) metrics to assess AI-related risk. These are embedded in our AI readiness scorecards and risk evaluations. We describe some of these metrics, methodologies, and limitations in system cards and in research we publish, some of which can be found here: https://openai.com/news/research/
It is worthwhile to note that some risks (e.g., reputational or ethical harm) resist easy quantification, and we lean on expert judgment. AI system behavior can change over time (e.g., through fine-tuning or model evolution), so risk scores may need ongoing validation. Additionally, risk evaluations are context-sensitive — what is acceptable in one product setting may be high-risk in another. Risk evaluations are continually re-evaluated based on changes to the threat landscape, and in response to adverse, anomalous, or malicious activity observed on our platform.
OpenAI has mechanisms in place to receive reports of incidents and vulnerabilities from third parties. These mechanisms are part of OpenAI’s commitment to safety and security, allowing external researchers, users, and other stakeholders to report issues that they may encounter.
For more detailed information, you can visit OpenAI's responsible disclosure page: https://openai.com/policies/coordinated-vulnerability-disclosure-policy/
OpenAI provides a model behavior feedback form (https://openai.com/form/model-behavior-feedback/) where users can submit reports when our models behave in unexpected or unwanted ways and maintains a bug bounty program through BugCrowd (https://bugcrowd.com/openai). In addition, OpenAI runs a Cybersecurity Grant Program to support research and development focused on protecting AI systems and infrastructure. This program encourages and funds initiatives that help identify and address vulnerabilities, ensuring the safe deployment of AI technologies.
Yes. Please see response for, 1c) Describe how your organization conducts testing (e.g., red-teaming) to evaluate the model’s/system’s fitness for moving beyond the development stage?
OpenAI contributes to the work of the following standard development organizations:
- NIST's AISIC working groups and task forces on provenance, risk management, biorisk
- CSA's AI Standards
- CoSAI Security and Safety Standards
- C2PA provenance standards
- MLCommons safety evaluations
- ISO risk management
- FMF on frontier risk
- Coalition for Health AI
OpenAI collaborates with relevant stakeholders across sectors to assess and adopt risk mitigation measures to address risks, in particular systemic risks:
- Protecting children: A critical focus of our safety work is protecting children. We’ve built strong default guardrails and safety measures into our models to mitigate potential harms to children. We detect and remove child sexual abuse material (CSAM) from training data and report any confirmed CSAM to the relevant authorities like the National Center for Missing & Exploited Children (NCMEC) in the U.S. In 2024, we joined industry peers in committing to Thorn’s Safety by Design principles (https://www.thorn.org/blog/generative-ai-principles/), which seeks to prioritize child safety at every stage in the development of AI. In 2025, we announced OpenAI as a founding partner of Robust Open Online Safety Tools (ROOST.tools), a community effort that will deliver free, open-source digital safety tools to public and private sectors globally, addressing critical gaps in online safety. We run Thorn’s CSAM classifier to detect novel CSAM over all image uploads and generations, and we run a hash filter over all image uploads to catch known CSAM.
- Election integrity: We currently do not allow users to use our tools, including ChatGPT, for political campaigning. For example, ChatGPT is trained to refuse requests for targeted political persuasion. Universally we also disallow the categorization of individuals based on their biometric data to deduce or infer sensitive attributes such as political opinions. To prevent abuse, we don’t allow users, builders on our API, or those creating shared GPTs to create tools that impersonate real people (e.g., candidates) or institutions (e.g., local government). We’ve also disrupted covert influence operations that sought to use our models in support of deceptive activity across the internet and continue to monitor and mitigate such abuses. We also have an opt-out process for public figures who don’t want their likeness to be generated by our models. To improve transparency around AI-generated content, we implemented C2PA’s digital credentials.
- Investment in impact assessment and policy analysis: Our impact assessment efforts have been widely influential in research, industry norms, and policy, including our work on measuring the chemical, biological, radiological, and nuclear (CBRN) risks associated with AI systems, and our research estimating the extent to which different occupations and industries might be impacted by language models. We also publish pioneering work on how society can best manage associated risks – for example, by working with external experts to assess the implications of language models for influence operations. (https://openai.com/index/building-an-early-warning-system-for-llm-aided-biological-threat-creation/; https://arxiv.org/abs/2303.10130)
- Partnering with governments: We partner with governments around the world to inform the development of effective and adaptable AI safety policies. This includes showing our work and sharing our learnings, collaborating to pilot government and other third party assurance, and informing the public debate over new standards and laws. (https://openai.com/global-affairs/our-approach-to-frontier-risk/) For example, in August 2024 we entered into voluntary agreements with the U.S. and UK AISIs to enable formal collaboration on AI safety research, testing and evaluation. As mentioned in the section on testing and red teaming, we partner with third party independent labs, academics, experts, and more to add an additional layer of validation for key risks.
No answer provided


























