Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

AI Vulnerability Database

Website

Github

Hugging Face

As the first open-source, extensible knowledge base of AI failures, AVID aims to

encompass coordinates of responsible ML such as security, ethics, and performance,
build out a taxonomy of potential harms across these coordinates.
house full-fidelity information (metadata, harm metrics, measurements, benchmarks, and mitigation techniques if any) on evaluation use cases of a harm (sub)category
evaluate models and datasets are either open-source, or accessible through APIs.

AVID has two components: a taxonomy that provides a landing place of instances of AI system/model/dataset failures, and a database that actually stores information on such instances in a structured manner.

The AVID taxonomy is intended to serve as a common foundation for data science/ML, product, and policy teams to manage potential risks at different stages of a ML workflow. In spirit, this taxonomy is analogous to MITRE ATT&CK for cybersecurity vulnerabilities, and MITRE ATLAS for adversarial attacks on ML systems.

At a high level, the AVID taxonomy consists of two views, intended to facilitate the work of two different user personas.

Effect view: for the auditor persona that aims to assess risks for a ML system of components of it.

Lifecycle view: for the developer persona that aims to build an end-to-end ML system while being cognizant of potential risks.

Note that based on case-specific needs, people involved with building a ML system may need to operate as either of the above personas.

The database component of AVID stores instantiations of AI risks—categorized using the above taxonomy—using two base data classes: Vulnerability and Report. A vulnerability (vuln) is a high-level evidence of an AI failure mode, in line with the NIST CVEs.These are linked to the taxonomy through multiple tags, denoting the AI risk domains (Security, Ethics, Performance) this vulnerability pertains to, (sub)categories under that domain, as well as AI lifecycle stages.

A report is one example of a particular vulnerability occurring, and is potentially more granular and reproducible based on the references provided in that report.

As an example, the vuln AVID-2022-V001 is about gender bias in the large language model bert-base-uncased. This bias is measured through multiple reports, AVID-2022-R0001 and AVID-2022-R0002, which measure gender bias in two separate contexts, using different metrics and datasets, and record salient information and references on those measurements.

AVID is a project under active development. Our preliminary version (v0.1) is publicly available, and we are collaborating with multiple companies and nonprofits to release v1 by summer-end of 2023. Please see more details about our planned roadmap here.

About the tool

You can click on the links to see the associated tools

Tool type(s):

Audit Process
Toolkit/software
Awareness building
Documentation process

Purpose(s):

Event/anomaly detection
Forecasting/prediction
Goal-driven optimisation
Interaction support/chatbots
Personalisation/recommenders
Reasoning with knowledge structures/planning
Recognition/object detection

Country of origin:

United States

Lifecycle stage(s):

Operate & monitor
Deploy
Verify & validate
Build & interpret model
Collect & process data
Plan & design

Type of approach:

Technical

Maturity:

In development
Project stage

Usage rights:

Open source/Permissive
Free of charge

License:

Apache License 2.0
MIT License

Target groups:

Civil society/specific social groups
Technical community

Target users:

Data scientist
Developer
Project manager
Researcher

Stakeholder group:

Civil society
Technical community

Validity:

Periodic review

Enforcement:

Reporting frameworks
Trust/Quality mark

Benefits:

Faster implementation
Open access material
Reduction in risk of failure

Geographical scope:

All countries

People involved:

All employees

Required skills:

Data
Domain expertise
Programming skills

Technology platforms:

Platform neutral

Tags:

ai auditing
ai risk management
ai vulnerabilities

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.