Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

COMPL-AI

Website

Github

Hugging Face

The EU's Artificial Intelligence Act (AI Act) is a significant step towards responsible AI development, but lacks clear technical interpretation, making it difficult to assess models' compliance. This work presents COMPL-AI, a comprehensive framework consisting of

(i) the first technical interpretation of the EU AI Act, translating its broad regulatory requirements into measurable technical requirements, with the focus on large language models (LLMs), and
(ii) an open-source Act-centered benchmarking suite, based on thorough surveying and implementation of state-of-the-art LLM benchmarks.

By evaluating 12 prominent LLMs in the context of COMPL-AI, the tool reveals shortcomings in existing models and benchmarks, particularly in areas like robustness, safety, diversity, and fairness. This work highlights the need for a shift in focus towards these aspects, encouraging balanced development of LLMs and more comprehensive regulation-aligned benchmarks. Simultaneously, COMPL-AI for the first time demonstrates the possibilities and difficulties of bringing the Act's obligations to a more concrete, technical level. As such, this work can serve as a useful first step towards having actionable recommendations for model providers, and contributes to ongoing efforts of the EU to enable application of the Act, such as the drafting of the GPAI Code of Practice.

About the tool

You can click on the links to see the associated tools

Developing organisation(s):

latticeflow ai

Tool type(s):

Technical validation
Risk management framework
Trust/Quality mark

Objective(s):

Safety
Transparency
Explainability

Impacted stakeholders:

Management
Regulators
Specific policy communities

Target sector(s):

Finance and insurance
Employment & labour
Corporate governance

Country of origin:

Switzerland
European Union

Lifecycle stage(s):

Operate & monitor
Verify & validate

Type of approach:

Technical

Maturity:

Implemented in multiple projects
Published document

Usage rights:

Open source/Permissive
Non-commercial

License:

Apache 2.0

Target groups:

Professionals
Public sector
Technical community

Target users:

Business leader
Data scientist
Government

Stakeholder group:

Academia
Business

Validity:

Periodic review

Enforcement:

Certification
Reporting frameworks
Trust/Quality mark

Benefits:

Increased quality results
Open access material
Reduction in risk of failure

Geographical scope:

All countries

People involved:

Government agencies
IT employees
Suppliers

Required skills:

Domain expertise

Technology platforms:

Platform neutral

Programming languages:

Python

Tags:

fairness
regulation compliance
robustness
safety
benchmarking
eu ai act
llm

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.