Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Evaluate Library and Evaluation on the Hub (Hugging Face)

Website

Github

Evaluate Library and Evaluation on the Hub (Hugging Face)

Evaluate is a library that makes evaluating and comparing models and reporting their performance easier and more standardized.

It currently contains:

implementations of dozens of popular metrics: the existing metrics cover a variety of tasks spanning from NLP to Computer Vision, and include dataset-specific metrics for datasets. With a simple command like accuracy = load("accuracy"), get any of these metrics ready to use for evaluating a ML model in any framework (Numpy/Pandas/PyTorch/TensorFlow/JAX).
comparisons and measurements: comparisons are used to measure the difference between models and measurements are tools to evaluate datasets.
an easy way of adding new evaluation modules to the Hub: you can create new evaluation modules and push them to a dedicated Space in the Hub with evaluate-cli create [metric name], which allows you to see easily compare different metrics and their outputs for the same sets of references and predictions.

Documentation

Find a metric, comparison, measurement on the Hub

Add a new evaluation module

Evaluate also has lots of useful features like:

Type checking: the input types are checked to make sure that you are using the right input formats for each metric
Metric cards: each metric comes with a card that describes the values, limitations, and ranges, as well as provides examples of their usage and usefulness.
Community metrics: Metrics live on the Hugging Face Hub and you can easily add your own metrics for your project or to collaborate with others.

Installation

With pip

Evaluate can be installed from PyPi and has to be installed in a virtual environment (venv or conda for instance)

pip install evaluate

Usage

Evaluate’s main methods are:

evaluate.list_evaluation_modules() to list the available metrics, comparisons, and measurements
evaluate.load(module_name, **kwargs) to instantiate an evaluation module
results = module.compute(*kwargs) to compute the result of an evaluation module

Adding a new evaluation module

First, install the necessary dependencies to create a new metric with the following command:

pip install evaluate[template]

Then you can get started with the following command which will create a new folder for your metric and display the necessary steps:

evaluate-cli create "Awesome Metric"

See this step-by-step guide in the documentation for detailed instructions.

Credits

Thanks to @marella for letting us use the evaluate namespace on PyPi previously used by his library.

About the tool

You can click on the links to see the associated tools

Developing organisation(s):

Hugging Face

Tool type(s):

Audit Process
Toolkit/software
Trust/Quality mark

Objective(s):

Fairness
Performance

Purpose(s):

Goal-driven optimisation

Lifecycle stage(s):

Verify & validate
Build & interpret model

Type of approach:

Technical

Maturity:

Implemented in multiple projects

Target users:

Data scientist
Developer

Validity:

Always up to date

Benefits:

Increased quality results
Open access material
Responsible implementation

Required skills:

Data
Programming skills

Programming languages:

Python

Modify this tool

Use Cases

There is no use cases for this tool yet.

Would you like to submit a use case for this tool?

If you have used this tool, we would love to know more about your experience.

Add use case

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.