Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Type

Clear all

Robustness

Origin

Scope

SUBMIT A TOOL

If you have a tool that you think should be featured in the Catalogue of Tools & Metrics for Trustworthy AI, we would love to hear from you!

Submit
Objective Robustness

TechnicalUploaded on Mar 20, 2026
garak is an open-source LLM vulnerability scanner developed by NVIDIA that probes large language models for security weaknesses including prompt injection, jailbreaks, hallucination, toxicity, data leakage, and misinformation.

TechnicalUploaded on Mar 20, 2026
OpenEnv is a framework for evaluating AI agents against real systems rather than simulations. It provides a standardised way to connect agents to real tools and workflows while preserving the structure needed for consistent and reliable evaluation.

ProceduralUploaded on Feb 16, 2026
The AI Inherent Risk Scale (AIIRS) is a task-based classification instrument that helps organisations assess the inherent risk of generative AI use. It evaluates tasks against three criteria—epistemic dependence, verifiability, and consequences of error—and assigns a LOW, MEDIUM, or HIGH risk rating using a max-dominant model. AIIRS supports proportionate safeguards, accountable oversight, and governance-aligned decision-making without determining whether AI use is permitted in a given context.

Related lifecycle stage(s)

Operate & monitor

TechnicalProceduralUploaded on Mar 20, 2026
The Approved Intelligence Platform (AIP) provides modular, scenario-based testing workflows to evaluate mission-critical AI systems in defence, public safety, and critical civil use cases. It delivers a comprehensive, end-to-end testing environment based on a proprietary AI trust ontology with measurable AI Solutions Quality Indicators (ASQI) for the testing, evaluation, validation and verification of software solutions with different AI modalities.

TechnicalEducationalUploaded on Aug 27, 2025
AI Screener to enable universal early screening for all children.

Objective(s)

Related lifecycle stage(s)

Plan & design

ProceduralUploaded on Aug 1, 2025
BeSpecial is an AI-driven platform designed to support university students with dyslexia by providing personalized digital tools and tailored learning strategies. Developed within the European VRAILEXIA project, BeSpecial combines clinical data, self-assessments, and psychometric tests to recommend customized resources like audiobooks and concept maps, as well as inclusive academic practices. The platform also raises awareness and trains educators to foster inclusive higher education environments.

Related lifecycle stage(s)

Operate & monitorDeploy

AustraliaUploaded on May 22, 2025
FloodMapp is a technology company that specialises in rapid real-time flood forecasting and flood inundation mapping to provide greater warning time and situational awareness.

Objective(s)


TechnicalEducationalMexicoUnited StatesIsraelUploaded on May 19, 2025
SeismicAI is a provider of innovative Earthquake Early Warning Systems (EEW) ensuring earthquake preparedness. SeismicAI's algorithms utilise local sensors to issue high-precision alerts for earthquake preparedness. The system covers the full early warning cycle - from monitoring and reporting, through alerts, to optionally triggering automated preventive actions.

Objective(s)

Related lifecycle stage(s)

Operate & monitor

TechnicalEuropeUploaded on May 19, 2025
The AIFS is the first fully operational weather prediction open model using machine learning technology for weather forecasting.

TechnicalUnited StatesUploaded on May 15, 2025
The GDA leverages aerial imagery, satellite data, and machine learning techniques to evaluate the damage in areas impacted by natural disasters. This tool greatly enhances the efficiency and precision of disaster response operations.

TechnicalUnited StatesUploaded on May 2, 2025
ATLAS (Adversarial Threat Landscape for Artificial-Intelligence Systems) is a globally accessible, living knowledge base of adversary tactics and techniques against Al-enabled systems based on real-world attack observations and realistic demonstrations from Al red teams and security groups.

ProceduralCanadaUploaded on Mar 31, 2025
This program provides organisations with a comprehensive, independent review of their AI approaches, ensuring alignment with consensus standards and enhancing trust among stakeholders and the public in their AI practices.

Related lifecycle stage(s)

Verify & validate

TechnicalUnited StatesUploaded on Jan 8, 2025
MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios like AI chatbots, image classification, etc. The benchmark evaluates performance across different hardware and software configurations, providing command line interface.

ProceduralUploaded on Jan 6, 2025
This document addresses how artificial intelligence machine learning can impact the safety of machinery and machinery systems.

Objective(s)


ProceduralUploaded on Jan 6, 2025
ISO/IEC 25023:2016 defines quality measures for quantitatively evaluating system and software product quality in terms of characteristics and subcharacteristics defined in ISO/IEC 25010 and is intended to be used together with ISO/IEC 25010.

EducationalUnited KingdomUploaded on Dec 9, 2024
PLIM is designed to make benchmarking and continuous monitoring of LLMs safer and more fit for purpose. This is particularly important in high-risk environments, e.g. healthcare, finance, insurance and defence. Having community-based prompts to validate models as fit for purpose is safer in a world where LLMs are not static. 

TechnicalFranceUploaded on Dec 6, 2024
AIxploit is a tool designed to evaluate and enhance the robustness of Large Language Models (LLMs) through adversarial testing. This tool simulates various attack scenarios to identify vulnerabilities and weaknesses in LLMs, ensuring they are more resilient and reliable in real-world applications.

Objective(s)

Related lifecycle stage(s)

Operate & monitorVerify & validate

TechnicalUploaded on Dec 6, 2024
Continuous proactive AI red teaming platform for AI and GenAI models, applications and agents.

TechnicalUnited KingdomUploaded on Dec 6, 2024
Continuous automated red teaming for AI, minimize security threats to AI models and applications.

TechnicalInternationalUploaded on Dec 6, 2024
Evaluating machine learning agents on machine learning engineering.

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.