Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Explainability

Clear all

Scope

SUBMIT A METRIC

If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!

Submit
This page includes technical metrics and methodologies for measuring and evaluating AI trustworthiness and AI risks. These metrics are often represented through mathematical formulas that assess the technical requirements for achieving trustworthy AI in a particular context. They can help to ensure that a system is fair, accurate, explainable, transparent, robust, safe, or secure.
Objective Explainability

Metric for Evaluation of Translation with Explicit ORdering (METEOR) is a machine translation evaluation metric, which is calculated based on the harmonic mean of precision and recall, with recall weighted more than precision.

METEOR is based on a gen...


In a cooperative game, there are n players D = {1,...,n} and a score function v : 2[n] → R assigns a reward to each of 2 n subsets of players: v(S) is the reward if the players in subset S ⊆ D cooperate. We view the supervised machine learning problem as a coo...

Context Entities Recall measures the recall of entities in retrieved contexts based on the entities present in both the reference and retrieved contexts, relative to the entities in the reference alone. This metric evaluates what fraction of entities in the...


Log odds ratio: A statistical measure used to quantify the strength of association between two events. It is the logarithm of the odds ratio.

Trustworthy AI Relevance

This metric addresses Explainability and Fairness


Contextual Outlier INterpretation (COIN) is a method designed to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, att that contribute to the abnormality, a...

Given an input data sample, LEMNA generates a small set of interpretable features to explain how the input sample is classified. The core idea is to approximate a local area of the complex deep learning decision boundary using a simple interpretable model. ...


Shapley Additive Explanations (SHAP) is a method that quantifies the contribution of each feature to the output of a predictive model. Rooted in cooperative game theory, SHAP values provide a theoretically sound approach for interpreting complex models by d...


Local Interpretable Model-agnostic Explanations (LIME) is a method developed to enhance the explainability and transparency of machine learning models, particularly those that are complex and difficult to interpret. It is designed to provide clear, localize...


Following the VIC framework, our proposed ShapleyVIC extends the widely used Shapley-based variable importance measures beyond final models for a comprehensive assessment and has important practical implications.

Trustworthy AI Relevance

This metr...


Ideally we would like to obtain a more complete understanding of variable importance for the set of models that predict almost equally well. This set of almost-equally-accurate predictive models is called the Rashomon set; it is the set of models with training...

Beta Shapley is a unified data valuation framework that naturally arises from Data Shapley by relaxing the efficiency axiom. The Beta(α, β)-Shapley value considers the pair of hyperparameters (α, β) which decides the weight distribution on [n]. Beta(1,1)-Shapl...

SARI (system output against references and against the input sentence) is a metric used for evaluating automatic text simplification systems.

The metric compares the predicted simplified sentences against the reference and the source sentences. It exp...


The Partial Dependence Complexity metric uses the concept of Partial Dependence curve to evaluate how simple this curve can be represented. The partial dependence curve is used to show model predictions are affected on average by each feature. Curves repres...


The α-Feature Importance metric quantifies the minimum proportion of features required to represent α of the total importance. In other words, this metric is focused in obtaining the minimum number of features necessary to obtain no less than α × 100% of th...


Local Feature Importance refers to the assignment of feature normalized importance to different regions of the input data space. For a given dataset D with N samples, it is possible to compute a vector of feature importance for each individual observation d...


The metric GFIS is based on the concept of entropy. More precisely on the entropy of the normalized features measure, which represents the concentration of information within a set of features. Lower entropy values indicate that the majority of the explanat...


Machine learning models, at the core of AI applications,  typically achieve a high accuracy at the expense of an insufficient explainability. Moreover, according to the proposed regulations,  AI applications based on machine learning must be "trus...


Tree Edit Distance (TED) is a metric for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts.

Trustworthy AI Relevance

This metric addresses Robustness and Expl...


The Normalized Scanpath Saliency was introduced to the saliency community as a simple correspondence measure between saliency maps and ground truth, computed as the average normalized saliency at fixated locations. Unlike in AUC, the absolute saliency value...


In statistics, Spearman's rank correlation coefficient or Spearman's ρ is a non-parametric measure of rank correlation (statistical dependence between the rankings of two variables). It assesses how well the relationship between two variables can be describ...


catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.