Tools for Trustworthy AI

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Show tools Show use cases

Approach Technical

Lifecycle stage(s) Operate & monitor

garak, LLM vulnerability scanner

TechnicalUploaded on Mar 20, 2026

garak is an open-source LLM vulnerability scanner developed by NVIDIA that probes large language models for security weaknesses including prompt injection, jailbreaks, hallucination, toxicity, data leakage, and misinformation.

Objective(s)

Related lifecycle stage(s)

Operate & monitor Deploy Verify & validate

OpenEnv

TechnicalUploaded on Mar 20, 2026

OpenEnv is a framework for evaluating AI agents against real systems rather than simulations. It provides a standardised way to connect agents to real tools and workflows while preserving the structure needed for consistent and reliable evaluation.

Objective(s)

Robustness Transparency

Related lifecycle stage(s)

Operate & monitor Deploy Verify & validate

Resaro's Approved Intelligence Platform (AIP)

TechnicalProceduralUploaded on Mar 20, 2026

The Approved Intelligence Platform (AIP) provides modular, scenario-based testing workflows to evaluate mission-critical AI systems in defence, public safety, and critical civil use cases. It delivers a comprehensive, end-to-end testing environment based on a proprietary AI trust ontology with measurable AI Solutions Quality Indicators (ASQI) for the testing, evaluation, validation and verification of software solutions with different AI modalities.

Objective(s)

Robustness Safety Transparency

Related lifecycle stage(s)

Operate & monitor Deploy Verify & validate

Resaro AI Solutions Quality Index Engineer (ASQI Engineer)

TechnicalUploaded on Jan 19, 2026

ASQI Engineer is an open-source framework for testing and assuring AI systems. Built for scale and reliability, it uses containerised test packages, automated assessments, and repeatable workflows to make evaluation transparent and robust. With ASQI Engineer, organisations also run ASQIs that they have created themselves, giving teams full control and confidence in AI quality.

Objective(s)

Explainability Digital Security

Related lifecycle stage(s)

Operate & monitor Deploy Verify & validate

Resaro AI Solutions Quality Index (ASQI)

TechnicalProceduralUploaded on Mar 20, 2026

The Resaro AI Solutions Quality Index (ASQI) provides a transparent, use-case-specific measure of AI quality — for applications such as customer chat services, object recognition, deepfake detection, or x-ray anomaly identification.

Objective(s)

Safety Digital Security

Related lifecycle stage(s)

Operate & monitor Deploy Verify & validate

Inspect

TechnicalUploaded on Oct 9, 2025

An open-source framework for large language model evaluations. Inspect can be used for a broad range of evaluations that measure coding, agentic tasks, reasoning, knowledge, behavior, and multi-modal understanding.

Objective(s)

Transparency Explainability

Related lifecycle stage(s)

Operate & monitor Verify & validate

Voiceitt

TechnicalEducationalUploaded on Oct 1, 2025

An AI-powered speech recognition app that adapts to users' unique speech patterns, facilitating communication for individuals with speech impairments.

Objective(s)

Fairness Human Agency & Control

Related lifecycle stage(s)

Operate & monitor

ReadSpeaker

TechnicalUploaded on Aug 27, 2025

ReadSpeaker is a SaaS-based text-to-speech platform providing natural-sounding, multilingual voices for seamless integration in web, document, and application environments.

Objective(s)

Privacy Human Agency & Control

Related lifecycle stage(s)

Operate & monitor

aRTIFICIAL iNTELLIGENCE for the Deaf

TechnicalUploaded on Aug 27, 2025

aiD leverages artificial intelligence to improve communication accessibility for deaf and hard-of-hearing individuals through advanced speech-to-text and sign language technologies.

Objective(s)

Privacy Human Agency & Control

Related lifecycle stage(s)

Operate & monitor Verify & validate

Dytective

TechnicalEducationalUploaded on Aug 1, 2025

Dytective by Change Dyslexia is an innovative AI-powered tool designed to detect the risk of dyslexia in children quickly and reliably. Developed in collaboration with researchers, Dytective combines language exercises with machine learning to screen for dyslexia in just 15 minutes. Backed by scientific validation and used by schools and families worldwide, it empowers early intervention and promotes equal opportunities in education.

Objective(s)

Human Agency & Control Safety

Related lifecycle stage(s)

Operate & monitor Deploy

SeismicAI's Earthquake Early Warning Systems

TechnicalEducationalMexicoUnited StatesIsraelUploaded on May 19, 2025

SeismicAI is a provider of innovative Earthquake Early Warning Systems (EEW) ensuring earthquake preparedness. SeismicAI's algorithms utilise local sensors to issue high-precision alerts for earthquake preparedness. The system covers the full early warning cycle - from monitoring and reporting, through alerts, to optionally triggering automated preventive actions.

Objective(s)

Robustness Safety

Related lifecycle stage(s)

Operate & monitor

Risk Atlas Nexus

TechnicalIrelandUploaded on May 2, 2025

Risk Atlas Nexus provides tooling to connect fragmented AI governance resources through a community-driven approach to curation of linkages between risks, datasets, benchmarks, and mitigations. It transforms abstract risk definitions into actionable AI governance workflows.

Objective(s)

Safety Transparency

Related lifecycle stage(s)

Operate & monitor Verify & validate Plan & design

Eticas Bias

TechnicalUnited StatesUploaded on Mar 24, 2025

An open-source Python library designed for developers to calculate fairness metrics and assess bias in machine learning models. This library provides a comprehensive set of tools to ensure transparency, accountability, and ethical AI development.

Objective(s)

Fairness Transparency

Related lifecycle stage(s)

Operate & monitor Build & interpret model Collect & process data

COMPL-AI

TechnicalSwitzerlandEuropean UnionUploaded on Jan 24, 2025

COMPL-AI is an open-source compliance-centered evaluation framework for Generative AI models

Objective(s)

Safety Data Governance & Traceability

Related lifecycle stage(s)

Operate & monitor Verify & validate

MLPerf Client

TechnicalUnited StatesUploaded on Jan 8, 2025

MLPerf Client is a benchmark for Windows and macOS, focusing on client form factors in ML inference scenarios like AI chatbots, image classification, etc. The benchmark evaluates performance across different hardware and software configurations, providing command line interface.

Objective(s)

Robustness Transparency

Related lifecycle stage(s)

Operate & monitor Deploy Build & interpret model

AIxploit

TechnicalFranceUploaded on Dec 6, 2024

AIxploit is a tool designed to evaluate and enhance the robustness of Large Language Models (LLMs) through adversarial testing. This tool simulates various attack scenarios to identify vulnerabilities and weaknesses in LLMs, ensuring they are more resilient and reliable in real-world applications.

Objective(s)

Robustness Safety

Related lifecycle stage(s)

Operate & monitor Verify & validate

Mindgard

TechnicalUnited KingdomUploaded on Dec 6, 2024

Continuous automated red teaming for AI, minimize security threats to AI models and applications.

Objective(s)

Robustness Digital Security

Related lifecycle stage(s)

Operate & monitor Deploy Verify & validate

PyRIT

TechnicalUnited StatesUploaded on Nov 8, 2024

The Python Risk Identification Tool for generative AI (PyRIT) is an open access automation framework to empower security professionals and machine learning engineers to proactively find risks in their generative AI systems.

Objective(s)

Transparency Explainability

Related lifecycle stage(s)

Operate & monitor Verify & validate

garak

TechnicalUploaded on Nov 5, 2024

garak, Generative AI Red-teaming & Assessment Kit, is an LLM vulnerability scanner. Garak checks if an LLM can be made to fail.

Objective(s)

Safety Digital Security

Related lifecycle stage(s)

Operate & monitor Verify & validate

Dioptra

TechnicalUnited StatesUploaded on Sep 9, 2024

Dioptra is an open source software test platform for assessing the trustworthy characteristics of artificial intelligence (AI). It helps developers on determining which types of attacks may impact negatively their model's performance.

Objective(s)

Human Agency & Control Safety

Related lifecycle stage(s)

Operate & monitor Verify & validate Build & interpret model

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.

Type

Origin

Scope

SUBMIT A TOOL

garak, LLM vulnerability scanner

OpenEnv

Resaro's Approved Intelligence Platform (AIP)

Resaro AI Solutions Quality Index Engineer (ASQI Engineer)

Resaro AI Solutions Quality Index (ASQI)

Inspect

Voiceitt

ReadSpeaker

aRTIFICIAL iNTELLIGENCE for the Deaf

Dytective

SeismicAI's Earthquake Early Warning Systems

Risk Atlas Nexus

Eticas Bias

COMPL-AI

MLPerf Client

AIxploit

Mindgard

PyRIT

garak

Dioptra