Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Type

Clear all

Origin

Scope

Clear all

SUBMIT A TOOL

If you have a tool that you think should be featured in the Catalogue of Tools & Metrics for Trustworthy AI, we would love to hear from you!

Submit
Approach Technical
Lifecycle stage(s) Operate & monitor

TechnicalUploaded on Mar 20, 2026
garak is an open-source LLM vulnerability scanner developed by NVIDIA that probes large language models for security weaknesses including prompt injection, jailbreaks, hallucination, toxicity, data leakage, and misinformation.

TechnicalUploaded on Mar 20, 2026
OpenEnv is a framework for evaluating AI agents against real systems rather than simulations. It provides a standardised way to connect agents to real tools and workflows while preserving the structure needed for consistent and reliable evaluation.

TechnicalProceduralUploaded on Mar 20, 2026
The Approved Intelligence Platform (AIP) provides modular, scenario-based testing workflows to evaluate mission-critical AI systems in defence, public safety, and critical civil use cases. It delivers a comprehensive, end-to-end testing environment based on a proprietary AI trust ontology with measurable AI Solutions Quality Indicators (ASQI) for the testing, evaluation, validation and verification of software solutions with different AI modalities.

TechnicalUploaded on Jan 19, 2026
ASQI Engineer is an open-source framework for testing and assuring AI systems. Built for scale and reliability, it uses containerised test packages, automated assessments, and repeatable workflows to make evaluation transparent and robust. With ASQI Engineer, organisations also run ASQIs that they have created themselves, giving teams full control and confidence in AI quality.

TechnicalProceduralUploaded on Mar 20, 2026
The Resaro AI Solutions Quality Index (ASQI) provides a transparent, use-case-specific measure of AI quality — for applications such as customer chat services, object recognition, deepfake detection, or x-ray anomaly identification.

TechnicalUploaded on Oct 9, 2025
An open-source framework for large language model evaluations. Inspect can be used for a broad range of evaluations that measure coding, agentic tasks, reasoning, knowledge, behavior, and multi-modal understanding.

Related lifecycle stage(s)

Operate & monitorVerify & validate

TechnicalEducationalUploaded on Oct 1, 2025
An AI-powered speech recognition app that adapts to users' unique speech patterns, facilitating communication for individuals with speech impairments.

Related lifecycle stage(s)

Operate & monitor

TechnicalUploaded on Aug 27, 2025
ReadSpeaker is a SaaS-based text-to-speech platform providing natural-sounding, multilingual voices for seamless integration in web, document, and application environments.

Related lifecycle stage(s)

Operate & monitor

TechnicalUploaded on Aug 27, 2025
aiD leverages artificial intelligence to improve communication accessibility for deaf and hard-of-hearing individuals through advanced speech-to-text and sign language technologies.

TechnicalEducationalUploaded on Aug 1, 2025
Dytective by Change Dyslexia is an innovative AI-powered tool designed to detect the risk of dyslexia in children quickly and reliably. Developed in collaboration with researchers, Dytective combines language exercises with machine learning to screen for dyslexia in just 15 minutes. Backed by scientific validation and used by schools and families worldwide, it empowers early intervention and promotes equal opportunities in education.

Related lifecycle stage(s)

Operate & monitorDeploy

TechnicalEducationalMexicoUnited StatesIsraelUploaded on May 19, 2025
SeismicAI is a provider of innovative Earthquake Early Warning Systems (EEW) ensuring earthquake preparedness. SeismicAI's algorithms utilise local sensors to issue high-precision alerts for earthquake preparedness. The system covers the full early warning cycle - from monitoring and reporting, through alerts, to optionally triggering automated preventive actions.

Objective(s)

Related lifecycle stage(s)

Operate & monitor

TechnicalIrelandUploaded on May 2, 2025
Risk Atlas Nexus provides tooling to connect fragmented AI governance resources through a community-driven approach to curation of linkages between risks, datasets, benchmarks, and mitigations. It transforms abstract risk definitions into actionable AI governance workflows.

TechnicalUnited StatesUploaded on Mar 24, 2025
An open-source Python library designed for developers to calculate fairness metrics and assess bias in machine learning models. This library provides a comprehensive set of tools to ensure transparency, accountability, and ethical AI development.

TechnicalSwitzerlandEuropean UnionUploaded on Jan 24, 2025
COMPL-AI is an open-source compliance-centered evaluation framework for Generative AI models

TechnicalUnited StatesUploaded on Jan 8, 2025
MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios like AI chatbots, image classification, etc. The benchmark evaluates performance across different hardware and software configurations, providing command line interface.

TechnicalFranceUploaded on Dec 6, 2024
AIxploit is a tool designed to evaluate and enhance the robustness of Large Language Models (LLMs) through adversarial testing. This tool simulates various attack scenarios to identify vulnerabilities and weaknesses in LLMs, ensuring they are more resilient and reliable in real-world applications.

Objective(s)

Related lifecycle stage(s)

Operate & monitorVerify & validate

TechnicalUnited KingdomUploaded on Dec 6, 2024
Continuous automated red teaming for AI, minimize security threats to AI models and applications.

TechnicalUnited StatesUploaded on Nov 8, 2024
The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

Related lifecycle stage(s)

Operate & monitorVerify & validate

TechnicalUploaded on Nov 5, 2024
garak, Generative AI Red-teaming & Assessment Kit, is an LLM vulnerability scanner. Garak checks if an LLM can be made to fail.

Related lifecycle stage(s)

Operate & monitorVerify & validate

TechnicalUnited StatesUploaded on Sep 9, 2024
Dioptra is an open source software test platform for assessing the trustworthy characteristics of artificial intelligence (AI). It helps developers on determining which types of attacks may impact negatively their model's performance.

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.