Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Resaro's Approved Intelligence Platform (AIP)

Website

The Approved Intelligence Platform (AIP) provides modular, scenario-based testing workflows to evaluate mission-critical AI systems in defence, public safety, and critical civil use cases. It delivers a comprehensive, end-to-end testing environment based on a proprietary AI trust ontology with measurable AI Solutions Quality Indicators (ASQI) for the testing, evaluation, validation and verification of software solutions with different AI modalities.

The platform provides streamlined end-to-end testing through a single UI to align governance, technical and business stakeholders; as well as a developer toolkit that includes APIs, command-line tools and pre-built integrations to enable test execution into development workflows and CI/CD pipeline. Together, this delivers a solution that:

Enable alignment between business, governance, and technical teams on desired AI outcomes
Allow for multi-modal synthetic data generation
Trigger on-demand technical tests to evaluate business, governance, and technical dimensions effortlessly
Enable clear decision workflows to accelerate AI deployment

Value Proposition

Adopt a Common Quality Standard Across AI Use Cases
Leverage the AI Solutions Quality Index as a shared language for evaluation, enabling consistent evaluation and alignment of expectations across business, governance and technology teams
Scale AI Quality Checks for Trustworthy AI Deployment
Gain structured, quantitative insights into performance and risk handling dimensions of AI systems to enable clear decision-making processes to push AI systems into production
Build Robust Quality Baselines for Ongoing Testing of AI systems
Confidently scale AI applications already in production, with assurance that quality and risk handling baselines and controls are in place.

Key Features

Central Project Registry
Provides a single source of truth for all AI projects, allowing users to manage registration and tracking for testing. It maintains full audit trails, so every test run can be traced back to the System Under Test (e.g. model) and project version. It is accessible via the AIP UI, ensuring version control, visibility over project ownership and audit trails to support compliance and lifecycle management.
ASQI-Based Test Plan Library
Provides standardised ASQI-aligned test plans (structured and stored in the Data Layer) that can be attached to a project. Each test plan defines the relevant quality indicators, thresholds and associated AIP Test Packages.
Automated Test Execution Engine
Runs test plans automatically, connecting to different AI model endpoints and performing benchmarking, regression checks, and scenario-based evaluations. The test execution steps are run through the AIP Test Runner and containerised test packages, ensuring consistent, repeatable results across all environments and model versions.
Structured Reporting and Analysis
Provides scorecards for business, operational and governance users to communicate key quality results. Detailed technical reports for technical users allow for in-depth analysis on the test data, technical results and evaluation metrics. Reports are generated from test outputs stored in the Data Layer, giving all stakeholders unified and meaningful views of quality.
Test Plan Builder
The Test Plan Builder lets users create and configure test plans by selecting quality indicators, setting thresholds, and adding use case-specific metrics. Allows users to create, configure and customise their own test plans by adding indicators, setting thresholds, and adding use case-specific metrics. It provides a guided interface for assembling test components, ensuring each plan is correctly structured and ready for execution.

Modules:

The Approved Intelligence Platform’s Large Language Model (AIP-LLM) module is a dedicated solution for end-to-end testing of LLM-based systems in mission-critical use cases. It provides modular, scenario-based testing workflows anchored on language-model-specific quality indicators, enabling users to evaluate prompts, responses, and retrieval-augmented generation (RAG) pipelines, conduct contextualised evaluations across defined scenarios, and generate structured reports to communicate results and insights to both technical and non-technical stakeholders.

The Approved Intelligence Platform’s Computer Vision (AIP-CV) module is a dedicated solution for end-to-end testing of computer vision systems in mission-critical use cases. It provides modular, scenario-based testing workflows anchored on vision-specific quality indicators, enabling users to validate and augment image and video datasets, conduct contextualised evaluations across defined scenarios, and generate structured reports to communicate results and insights to both technical and non-technical stakeholders.

The Approved Intelligence Platform’s Synthetic Data Generation (AIP-SDG) is a module within the Approved Intelligence Platform (AIP) that enables controlled, scenario-driven synthetic data generation to support robust testing and evaluation of AI systems across multiple modalities. It provides modular, customisable workflows spanning video, audio, image, and text, leveraging continuously updated open-source and proprietary generation algorithms. Synthetic datasets generated through AIP-SDG integrate natively into AIP test plans, automated execution workflows, and reporting pipelines, ensuring consistent evaluation and comparability when evaluating an AI system.

About the tool

You can click on the links to see the associated tools

Developing organisation(s):

Resaro

Tool type(s):

Technical documentation
Standard
Product development tool

Objective(s):

Robustness
Safety
Transparency

Impacted stakeholders:

Consumers
Employees
Regulators

Purpose(s):

Governance and compliance
Risk management

Target sector(s):

Other
Defence
Public sector

Lifecycle stage(s):

Operate & monitor
Deploy
Verify & validate

Type of approach:

Technical
Procedural

Maturity:

Implemented in multiple projects

Usage rights:

Fee-based

Target groups:

Private sector
Public sector
Technical community

Target users:

Business leader
Data scientist
Developer

Stakeholder group:

Business
Government
Technical community

Validity:

Always up to date

Enforcement:

Certification
Reporting frameworks
Trust/Quality mark

Benefits:

Faster implementation
Increased quality results
Reduction in risk of failure

People involved:

Clients

Required skills:

Data
IT infrastructure
Programming skills

Technology platforms:

Platform neutral

Tags:

biases testing
evaluation
ai governance
ai assurance
approved intelligence platform

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.