Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Type

Origin

Scope

Clear all

SUBMIT A TOOL

If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!

Submit
Target user(s) Developer

ProceduralCanadaUploaded on Apr 1, 2025
An artificial intelligence (AI) impact assessment tool to provide organisations a method to assess AI systems for compliance with Canadian human rights law. The purpose of this human rights AI impact assessment is to assist developers and administrators of AI systems to identify, assess, minimise or avoid discrimination and uphold human rights obligations throughout the lifecycle of an AI system.

Related lifecycle stage(s)

Operate & monitorPlan & design

ProceduralFranceUploaded on Mar 31, 2025
PolicyPilot is designed to assist users in creating and managing AI policies, streamlining AI governance with automated compliance monitoring and risk management.

Objective(s)


ProceduralFranceUploaded on Apr 2, 2025
This tool provides a comprehensive risk management framework for frontier AI development, integrating established risk management principles with AI-specific practices. It combines four key components: risk identification through systematic methods, quantitative risk analysis, targeted risk treatment measures, and clear governance structures.

Objective(s)

Related lifecycle stage(s)

Build & interpret model

TechnicalUnited StatesUploaded on Mar 24, 2025
An open-source Python library designed for developers to calculate fairness metrics and assess bias in machine learning models. This library provides a comprehensive set of tools to ensure transparency, accountability, and ethical AI development.

TechnicalFranceEuropean UnionUploaded on Mar 24, 2025
Behavior Elicitation Tool (BET) is a complex-AI system that systematically probes and elicits specific behaviors from cutting-edge LLMs. Whether for red-teaming or targeted behavioral analysis, this automated solution is Dynamic Optimized and Adversarial (DAO) and can be configured to test the robustness precisely and help to have a better control of the AI system.

EducationalIrelandUploaded on Jan 29, 2025
The AI Risk Ontology (AIRO) is an open-source formal ontology that provides a minimal set of concepts and relations for modelling AI use cases and their associated risks. AIRO has been developed according to the requirements of the EU AI Act and international standards, including ISO/IEC 23894 on AI risk management and ISO 31000 family of standards.

Objective(s)

Related lifecycle stage(s)

Operate & monitorVerify & validate

TechnicalUnited StatesUploaded on Jan 8, 2025
MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios like AI chatbots, image classification, etc. The benchmark evaluates performance across different hardware and software configurations, providing command line interface.

ProceduralUploaded on Jan 6, 2025
ISO/IEC 25023:2016 defines quality measures for quantitatively evaluating system and software product quality in terms of characteristics and subcharacteristics defined in ISO/IEC 25010 and is intended to be used together with ISO/IEC 25010.

Objective(s)


United KingdomUploaded on Jan 9, 2025
This document provides a structured framework for gaining informed consent from individuals before using their copyright works (including posts, articles, or comments), Name, Image, Likeness (NIL), or other Personal Data, in a engineered system. It pulls together best current practice from many sources including GDPR, the Article 29 Working Party, multiple ISO standards and the NIST RMF framework and presents it one place.

EducationalUnited KingdomUploaded on Dec 9, 2024
PLIM is designed to make benchmarking and continuous monitoring of LLMs safer and more fit for purpose. This is particularly important in high-risk environments, e.g. healthcare, finance, insurance and defence. Having community-based prompts to validate models as fit for purpose is safer in a world where LLMs are not static. 

Objective(s)


TechnicalProceduralUnited StatesUploaded on Dec 6, 2024
Vectice is a regulatory MLOps platform for AI/ML developers and validators that streamlines documentation, governance, and collaborative reviewing of AI/ML models. Designed to enhance audit readiness and ensure regulatory compliance, Vectice automates model documentation, from development to validation. With features like automated lineage tracking and documentation co-pilot, Vectice empowers AI/ML developers and validators to work in their favorite environment while focusing on impactful work, accelerating productivity, and reducing risk.

TechnicalUnited KingdomUploaded on Dec 6, 2024
Continuous automated red teaming for AI, minimize security threats to AI models and applications.

Objective(s)

Related lifecycle stage(s)

Operate & monitorDeployVerify & validate

TechnicalInternationalUploaded on Dec 6, 2024
Evaluating machine learning agents on machine learning engineering.

Objective(s)


TechnicalUnited StatesUploaded on Nov 8, 2024
The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

Objective(s)

Related lifecycle stage(s)

Operate & monitorVerify & validate

ProceduralSingaporeUploaded on Oct 2, 2024
Resaro offers independent, third-party assurance of mission-critical AI systems. It promotes responsible, safe and robust AI adoption for enterprises, through technical advisory and evaluation of AI systems against emerging regulatory requirements.

ProceduralUnited KingdomUploaded on Oct 2, 2024
Warden AI provides independent, tech-led AI bias auditing, designed for both HR Tech platforms and enterprises deploying AI solutions in HR. As the adoption of AI in recruitment and HR processes grows, concerns around fairness have intensified. With the advent of regulations such as NYC Local Law 144 and the EU AI Act, organisations are under increasing pressure to demonstrate compliance and fairness.

Objective(s)

Related lifecycle stage(s)

Operate & monitorVerify & validate

ProceduralUploaded on Oct 2, 2024
FairNow is an AI governance software tool that simplifies and centralises AI risk management at scale. To build and maintain trust with customers, organisations must conduct thorough risk assessments on their AI models, ensuring compliance, fairness, and security. Risk assessments also ensure organisations know where to prioritise their AI governance efforts, beginning with high-risk models and use cases.

Objective(s)


TechnicalUploaded on Nov 5, 2024
garak, Generative AI Red-teaming & Assessment Kit, is an LLM vulnerability scanner. Garak checks if an LLM can be made to fail.

Objective(s)

Related lifecycle stage(s)

Operate & monitorVerify & validate

TechnicalInternationalUploaded on Nov 5, 2024
A fast, scalable, and open-source framework for evaluating automated red teaming methods and LLM attacks/defenses. HarmBench has out-of-the-box support for transformers-compatible LLMs, numerous closed-source APIs, and several multimodal models.

TechnicalUnited StatesUploaded on Sep 9, 2024
Harms Modeling is a practice designed to help you anticipate the potential for harm, identify gaps in product that could put people at risk, and ultimately create approaches that proactively address harm.

Objective(s)


catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.