Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Type

Origin

Scope

SUBMIT A TOOL

If you have a tool that you think should be featured in the Catalogue of Tools & Metrics for Trustworthy AI, we would love to hear from you!

Submit

EducationalProceduralIrelandUploaded on Jun 5, 2026
This handbook provides guidance on ethical and legal responsibilities associated with the use of high-risk AI systems in the EU civil security domain, focusing on use cases in border control, policing, and immigration. The aims of the handbook are to provide a structured pathway for engaging and understanding deployer responsibilities as outlined in the AI Act, focusing on high-risk AI systems, and to help establish processes for the ethical use of AI solutions.

InternationalUploaded on Jun 4, 2026
Amazon Nova Premier is a multimodal foundation model that was evaluated under Amazon’s Frontier Model Safety Framework to assess and mitigate risks related to Chemical, Biological, Radiological, and Nuclear (CBRN) weapons proliferation, offensive cyber operations, and automated AI research and development.

TechnicalInternationalUploaded on Jun 3, 2026
The AI red team service exposes hidden safety and security threats across the entire lifecycle of artificial intelligence (AI) systems by applying an adversarial mindset to assess AI systems during design, development, deployment, and operations stages.

TechnicalInternationalUploaded on Jun 3, 2026
SegMate is an open source AI Toolkit developed by the Vector Institute, which can help organizations and researchers apply cutting-edge computer vision techniques in the fight against climate change

Related lifecycle stage(s)

Collect & process data

TechnicalInternationalUploaded on Jun 3, 2026
Fuel iX is an enterprise AI platform that enables organisations to connect their infrastructure to a library of large language models and build, deploy and manage generative AI applications with centralized control and observability.

TechnicalInternationalUploaded on Jun 3, 2026
FlowMS is an AI-powered utility efficiency tool built on AWS that analyses metering data to detect anomalies in water use and help conserve water in Amazon buildings.

TechnicalUploaded on Jun 3, 2026
AI Ethics for Fairness is a software application that supports the detection, evaluation and mitigation of bias in AI models by analysing datasets, training models and applying fairness processing techniques.

TechnicalUploaded on Jun 3, 2026
LLM Vulnerability Scanner and Guardrails provides comprehensive assessment of LLM vulnerabilities and automatic application of optimal defensive techniques to generative AI on LLMs.

Related lifecycle stage(s)

DeployVerify & validate

TechnicalUploaded on Jun 3, 2026
The Agentic Benchmark for CRM is a benchmarking framework developed by Salesforce to evaluate the performance of AI agents and models in enterprise customer relationship management (CRM) use cases using metrics such as accuracy, cost, speed, trust and safety, and sustainability.

TechnicalUploaded on Jun 3, 2026
AI Energy Score is an initiative to establish standardized energy efficiency ratings for AI models in order to help the industry make informed decisions about sustainability in AI development.

TechnicalUploaded on Jun 3, 2026
The Moderation endpoint allows developers to classify text and image inputs to determine whether they may violate OpenAI’s safety policies.

Objective(s)

Related lifecycle stage(s)

Operate & monitorDeploy

Uploaded on Jun 3, 2026
The Google Responsible Generative AI Toolkit provides tools and guidance to design, build and evaluate open AI models responsibly.

TechnicalUploaded on Jun 3, 2026
ShieldGemma is a set of instruction tuned models for evaluating the safety of text and images against a set of defined safety policies.

Objective(s)

Related lifecycle stage(s)

Operate & monitorDeploy

TechnicalUploaded on Jun 3, 2026
The Einstein Trust Layer is a secure AI architecture built into the Salesforce platform that provides guardrails to protect data privacy and security, improve the safety and accuracy of AI outputs, and enable Salesforce customers to use generative AI responsibly within Salesforce applications.

TechnicalUploaded on Jun 3, 2026
FACTS Grounding is a comprehensive benchmark for evaluating the ability of LLMs to generate responses that are not only factually accurate with respect to given inputs, but also sufficiently detailed to provide satisfactory answers to user queries.

Objective(s)

Related lifecycle stage(s)

Verify & validateBuild & interpret model

EducationalUploaded on Jun 3, 2026
The OWASP Top 10 for Large Language Model (LLM) Applications is a written guidance document developed by the OWASP community to identify the most critical security risks affecting applications that use large language models.

TechnicalUploaded on Jun 3, 2026
Content Credentials provide tamper-evident metadata that gives you more information about how a piece of digital content was created and edited.

Related lifecycle stage(s)

Operate & monitor

TechnicalUploaded on Jun 3, 2026
Eureka is a reusable and open evaluation framework for standardizing evaluations of large foundation models beyond single-score reporting and rankings.

TechnicalUnited StatesUploaded on May 18, 2026
VERA-MH (Validation of Ethical and Responsible AI in Mental Health) is a comprehensive framework for evaluating AI chatbots in a mental health context.

TechnicalUploaded on Jun 3, 2026
The MLCommons AILuminate benchmark evaluates an AI system-under-test (SUT) by inputting a set of prompts, recording the SUT’s responses, and then using a specialized set of “safety evaluators models” to determine which of the responses are violations according to the AILuminate Assessment Standard guidelines. Findings are summarized in a human-readable report.

Objective(s)

Related lifecycle stage(s)

Verify & validate

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.