Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Resaro's Approved Intelligence Platform (AIP)



The Approved Intelligence Platform (AIP) provides modular, scenario-based testing workflows to evaluate mission-critical AI systems in defence, public safety, and critical civil use cases. It delivers a comprehensive, end-to-end testing environment based on a proprietary AI trust ontology with measurable AI Solutions Quality Indicators (ASQI) for the testing, evaluation, validation and verification of software solutions with different AI modalities. 

The platform provides streamlined end-to-end testing through a single UI to align governance, technical and business stakeholders; as well as a developer toolkit that includes APIs, command-line tools and pre-built integrations to enable test execution into development workflows and CI/CD pipeline. Together, this delivers a solution that:  

  • Enable alignment between business, governance, and technical teams on desired AI outcomes
  • Allow for multi-modal synthetic data generation
  • Trigger on-demand technical tests to evaluate business, governance, and technical dimensions effortlessly
  • Enable clear decision workflows to accelerate AI deployment

Value Proposition

  • Adopt a Common Quality Standard Across AI Use Cases
    Leverage the AI Solutions Quality Index as a shared language for evaluation, enabling consistent evaluation and alignment of expectations across business, governance and technology teams 
  • Scale AI Quality Checks for Trustworthy AI Deployment 
    Gain structured, quantitative insights into performance and risk handling dimensions of AI systems to enable clear decision-making processes to push AI systems into production
  • Build Robust Quality Baselines for Ongoing Testing of AI systems 
    Confidently scale AI applications already in production, with assurance that quality and risk handling baselines and controls are in place.

Key Features

  • Central Project Registry
    Provides a single source of truth for all AI projects, allowing users to manage registration and tracking for testing. It maintains full audit trails, so every test run can be traced back to the System Under Test (e.g. model) and project version. It is accessible via the AIP UI, ensuring version control, visibility over project ownership and audit trails to support compliance and lifecycle management.
  • ASQI-Based Test Plan Library
    Provides standardised ASQI-aligned test plans (structured and stored in the Data Layer) that can be attached to a project. Each test plan defines the relevant quality indicators, thresholds and associated AIP Test Packages. 
  • Automated Test Execution Engine
    Runs test plans automatically, connecting to different AI model endpoints and performing benchmarking, regression checks, and scenario-based evaluations. The test execution steps are run through the AIP Test Runner and containerised test packages, ensuring consistent, repeatable results across all environments and model versions.
  • Structured Reporting and Analysis
    Provides scorecards for business, operational and governance users to communicate key quality results. Detailed technical reports for technical users allow for in-depth analysis on the test data, technical results and evaluation metrics. Reports are generated from test outputs stored in the Data Layer, giving all stakeholders unified and meaningful views of quality.
  • Test Plan Builder
    The Test Plan Builder lets users create and configure test plans by selecting quality indicators, setting thresholds, and adding use case-specific metrics. Allows users to create, configure and customise their own test plans by adding indicators, setting thresholds, and adding use case-specific metrics. It provides a guided interface for assembling test components, ensuring each plan is correctly structured and ready for execution.

Modules:

The Approved Intelligence Platform’s Large Language Model (AIP-LLM) module is a dedicated solution for end-to-end testing of LLM-based systems in mission-critical use cases.   It provides modular, scenario-based testing workflows anchored on language-model-specific quality indicators, enabling users to evaluate prompts, responses, and retrieval-augmented generation (RAG) pipelines, conduct contextualised evaluations across defined scenarios, and generate structured reports to communicate results and insights to both technical and non-technical stakeholders.

The Approved Intelligence Platform’s Computer Vision (AIP-CV) module is a dedicated solution for end-to-end testing of computer vision systems in mission-critical use cases. It provides modular, scenario-based testing workflows anchored on vision-specific quality indicators, enabling users to validate and augment image and video datasets, conduct contextualised evaluations across defined scenarios, and generate structured reports to communicate results and insights to both technical and non-technical stakeholders.

The Approved Intelligence Platform’s Synthetic Data Generation (AIP-SDG) is a module within the Approved Intelligence Platform (AIP) that enables controlled, scenario-driven synthetic data generation to support robust testing and evaluation of AI systems across multiple modalities. It provides modular, customisable workflows spanning video, audio, image, and text, leveraging continuously updated open-source and proprietary generation algorithms. Synthetic datasets generated through AIP-SDG integrate natively into AIP test plans, automated execution workflows, and reporting pipelines, ensuring consistent evaluation and comparability when evaluating an AI system.

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case
Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.