Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Dyff

Website

Tool package

Gitlab

Dyff is a cloud platform for hosting high-integrity evaluations of AI system performance, designed for the use case of testing deployment-ready AI systems on private datasets. Dyff evaluates AI systems that are packaged as containerized web services. The systems under test run within Dyff, allowing evaluations on private datasets where the evaluation data never leaves the Dyff system. This maximizes the useful lifetime of the evaluation by preventing AI systems from being trained on the evaluation data. The combination of the AI system image, additional data volumes, and runtime configuration is stored together under a permanent ID, making the systems and evaluations fully reproducible.

Evaluations are implemented as Jupyter notebooks. Dyff serves the rendered results of these notebooks, as well as providing hooks for returning scores that can be viewed in various summary “dashboards”. Dyff can also be used to host “challenges”, where participants compete to submit systems with the best performance on the challenge tasks.

Dyff is free and open source software with no mandatory proprietary dependencies. It runs on Kubernetes and is designed to be deployable on most Kubernetes clusters. Dyff is deployed using infrastructure-as-code technologies, and a free and open-source production-ready deployment configuration is also available.

About the tool

You can click on the links to see the associated tools

Developing organisation(s):

ul research institutes

Tool type(s):

Toolkit/software

Objective(s):

Safety
Data Governance & Traceability

Impacted stakeholders:

Consumers
Regulators

Purpose(s):

Governance and compliance

Target sector(s):

Industry & entrepreneurship
Science & technology
Public governance

Country/Territory of origin:

United States

Lifecycle stage(s):

Verify & validate

Type of approach:

Technical

Maturity:

In development

Usage rights:

Open source/Permissive

License:

Apache 2.0

Target groups:

Academia/educators/students
Private sector
Public sector

Target users:

Data scientist
Developer
Policy makers

Stakeholder group:

Business

Benefits:

Reduction in risk of failure
Responsible implementation

Geographical scope:

All countries

Required skills:

Data
IT infrastructure
Programming skills

Technology platforms:

Multi-platform

Tags:

data catalogue
ai governance
ai auditing
large langage models
ai evaluation
deepfakes

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.