Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Holistic AI Governance, Risk and Compliance Platform

Website

Holistic AI is an AI Governance platform headquartered in the U.S., with offices in Palo Alto and London. Holistic AI offers end-to-end solutions to test and validate the safety, robustness, and trustworthiness of algorithms deployed across various sectors.

Holistic AI’s AI Governance, Risk Management and Compliance Platform provides a Software as a Service (SaaS) one-stop-shop to govern enterprise AI systems at scale. The platform utilises unique and proprietary solutions based on foundational research in trustworthy AI, such as AI robustness, bias, privacy, and transparency.

Each AI system is evaluated against a five-point criteria derived from Koshiyama et al (2021):

Bias: Risks of the AI system generating biased outputs due to improper training data (training bias), inappropriate context (transfer-context bias) and inadequate inferential capabilities (inference bias).
Efficacy: Risk of the AI system underperforming relative to its use case.
Robustness: Risks of the AI system failing in instances of adversarial attacks.
Privacy: Risks associated with the AI system leaking sensitive information or personal data.
Explainability: Risks of the AI system generating arbitrary decisions, with its outputs not understandable to developers, deployers and users.

Each of these risk verticals do not occur in siloes and can often be interrelated. A growing field of research strongly emphasizes the trade-offs and interactions that may occur between them. The Holistic AI Governance platform not only allows enterprises to review the performance of its AI systems against the criteria, but also streamlines the decision-making process around navigating trade-offs.

Holistic AI provides a proprietary solution for auditing Language Models through Safeguard, a specialized module dedicated to LM Governance. When specifically evaluating Large Language Models (LLMs), the Holistic AI team uses a robust combination of the following approaches:

Benchmarking: Involves evaluating LLMs against both, academic and internally developed datasets to gauge levels of model bias, hallucinations, personal information leakage, toxicity, explainability and robustness.
Red Teaming: Involves adversarially prompting LLMs to unearth unknown model vulnerabilities through mechanisms like Jailbreaking and debiasing.
Fine Tuning: Involves leveraging high-quality datasets to align models towards safety, helpfulness and harm reduction.
Human oversight: This involves assessment of LLM-generated content by reviewers for its relevance, accuracy, and appropriateness, to identify any discrepancies that need improvement.
Assurance: Benchmarking risk discovery, triage, assessment and mitigation processes to regulation (like the EU AI Act), and standards (such as the NIST’s AI Risk Management Framework (AI RMF) to aid with compliance readiness and assurance.

The platform is structured around the EU’s risk-based approach to AI governance, mapping high to low-risk systems in a single pane Red-Amber-Green dashboard. It is used for both internal development and deployment, and for procurement (third-party risk management).

Bringing AI law, policy and engineering together, the platform uniquely interconnects all risk verticals as needed to generate a complete picture of an enterprise’s AI risk exposure. This includes a mitigation function for when issues are identified.

About the tool

You can click on the links to see the associated tools

Tool type(s):

Toolkit/software
Technical validation
Trust/Quality mark

Objective(s):

Transparency
Data Governance & Traceability

Impacted stakeholders:

Employees

Purpose(s):

Event/anomaly detection
Forecasting/prediction
Goal-driven optimisation
Interaction support/chatbots
Personalisation/recommenders
Reasoning with knowledge structures/planning
Recognition/object detection

Target sector(s):

Industry & entrepreneurship
Health
Finance and insurance
Environment
Employment & labour
Education
Digital Economy
Transport

Lifecycle stage(s):

Operate & monitor
Deploy
Verify & validate
Build & interpret model
Collect & process data
Plan & design

Type of approach:

Technical
Procedural

Maturity:

Implemented in multiple projects

Usage rights:

Fee-based

Target groups:

Private sector
Public sector

Target users:

All employees

Stakeholder group:

Academia
Business

Validity:

Always up to date

Enforcement:

Trust/Quality mark

Benefits:

Reduction in risk of failure
Responsible implementation

Geographical scope:

All countries

People involved:

All employees

Required skills:

IT infrastructure

Technology platforms:

Multi-platform

Tags:

ai ethics
ai responsible
ai risks
digital ethics
trustworthy ai
ai assessment
ai governance
ai auditing

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.