Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Type

Transparency

Clear all

Origin

Scope

SUBMIT A TOOL

If you have a tool that you think should be featured in the Catalogue of Tools & Metrics for Trustworthy AI, we would love to hear from you!

Submit
Objective Transparency

TechnicalUploaded on Mar 20, 2026
OpenEnv is a framework for evaluating AI agents against real systems rather than simulations. It provides a standardised way to connect agents to real tools and workflows while preserving the structure needed for consistent and reliable evaluation.

ProceduralUploaded on Mar 20, 2026
Judgment Assurance is a decision-governance discipline that reframes human judgment as a governed institutional asset. It provides a structured framework and practical instruments, including the Underwriting Questionnaire (JA-UQ) and Maturity Model (JAMM-PS), to ensure that consequential AI-mediated decisions are reconstructible and defensible. By defining minimum governance controls for human oversight, it closes the "accountability gap," allowing institutions to define, record, own, and guard the reasoning behind consequential AI-supported outcomes.

ProceduralUploaded on Jan 15, 2026
WasItAI is an image-checker designed to detect AI-generated photos.

Objective(s)


TechnicalProceduralUploaded on Mar 20, 2026
The Approved Intelligence Platform (AIP) provides modular, scenario-based testing workflows to evaluate mission-critical AI systems in defence, public safety, and critical civil use cases. It delivers a comprehensive, end-to-end testing environment based on a proprietary AI trust ontology with measurable AI Solutions Quality Indicators (ASQI) for the testing, evaluation, validation and verification of software solutions with different AI modalities.

ProceduralUploaded on Nov 20, 2025
MISSION KI is developing a voluntary quality standard guideline for artificial intelligence (AI) that strengthens the reliability and trustworthiness of AI applications and systems. It sets a voluntary, evidence-based self-assessment framework for AI providers below the EU AI Act’s high-risk threshold. It defines six quality dimensions (data governance, non-discrimination, transparency, human oversight, reliability, AI-specific cybersecurity) and a stepwise procedure: describe the use case, analyse protection needs, rate requirements via a VCIO catalogue, document tests/evidence, validate findings, issue a report, and monitor validity.

Objective(s)

Related lifecycle stage(s)

Operate & monitorVerify & validate

ProceduralUploaded on Nov 20, 2025
MISSION KI's Compliance Monitor is a tool for monitoring compliance with legal frameworks to facilitate interoperability, data flows and benefit sharing.

EducationalUploaded on Nov 7, 2025
As part of the MISSION KI project, the initiative has developed the innovative data set search engine (Daseen), which for the first time enables cross-source searches for data sets.

TechnicalUploaded on Oct 9, 2025
An open-source framework for large language model evaluations. Inspect can be used for a broad range of evaluations that measure coding, agentic tasks, reasoning, knowledge, behavior, and multi-modal understanding.

Related lifecycle stage(s)

Operate & monitorVerify & validate

EducationalUploaded on Aug 27, 2025
Elements of AI is a free online course, offered in Slovakia by AIslovakIA and Comenius University. Created by the University of Helsinki and Reaktor with EU support, it introduces the basics of artificial intelligence through six interactive modules.

Related lifecycle stage(s)

Operate & monitor

ProceduralUploaded on Mar 20, 2026
The AI Governance Playbook from the Council on AI Governance helps organizations align people, processes and tools to achieve responsible AI outcomes.

TechnicalProceduralPolandUploaded on Jul 23, 2025
CAST is an open framework for responsible AI design and engineering. It offers design heuristics and patterns, and RAI recommendations through generative features and online content.

ProceduralItalyUploaded on Jun 19, 2025
ADMIT is a research tool within a broader methodological framework combining quantitative and qualitative strategies to identify, analyse, and mitigate social implications associated with automated decision-making systems while enhancing their potential benefits. It supports comprehensive assessments of sociotechnical impacts to inform responsible design, deployment, and governance of automation technologies.

TechnicalIrelandUploaded on May 2, 2025
Risk Atlas Nexus provides tooling to connect fragmented AI governance resources through a community-driven approach to curation of linkages between risks, datasets, benchmarks, and mitigations. It transforms abstract risk definitions into actionable AI governance workflows.

ProceduralFranceUploaded on Mar 31, 2025
PolicyPilot is designed to assist users in creating and managing AI policies, streamlining AI governance with automated compliance monitoring and risk management.

TechnicalUnited StatesUploaded on Mar 24, 2025
An open-source Python library designed for developers to calculate fairness metrics and assess bias in machine learning models. This library provides a comprehensive set of tools to ensure transparency, accountability, and ethical AI development.

EducationalIrelandUploaded on Jan 29, 2025
The AI Risk Ontology (AIRO) is an open-source formal ontology that provides a minimal set of concepts and relations for modelling AI use cases and their associated risks. AIRO has been developed according to the requirements of the EU AI Act and international standards, including ISO/IEC 23894 on AI risk management and ISO 31000 family of standards.

TechnicalUnited StatesUploaded on Jan 8, 2025
MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios like AI chatbots, image classification, etc. The benchmark evaluates performance across different hardware and software configurations, providing command line interface.

TechnicalProceduralUnited StatesUploaded on Dec 6, 2024
Vectice is a regulatory MLOps platform for AI/ML developers and validators that streamlines documentation, governance, and collaborative reviewing of AI/ML models. Designed to enhance audit readiness and ensure regulatory compliance, Vectice automates model documentation, from development to validation. With features like automated lineage tracking and documentation co-pilot, Vectice empowers AI/ML developers and validators to work in their favorite environment while focusing on impactful work, accelerating productivity, and reducing risk.

TechnicalInternationalUploaded on Dec 6, 2024
Evaluating machine learning agents on machine learning engineering.

TechnicalUnited StatesUploaded on Nov 8, 2024
The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

Related lifecycle stage(s)

Operate & monitorVerify & validate

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.