These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Scope
SUBMIT A METRIC
If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!
Submit Equal performance 28 related use cases
If a model systematically makes errors disproportionately for patients in the protected group, it is likely to lead to unequal outcomes. Equal performance refers to the assurance that a model is equally accurate for patients in the protect...
Objectives:
Recall 9 related use cases
Recall is the fraction of the positive examples that were correctly labeled by the model as positive. It can be computed with the equation: Recall = TP / (TP + FN) Where TP is the number of true positives and FN is the number of false negatives.
Objectives:
Gender-based Illicit Proximity Estimate (GIPE) 5 related use cases
This paper proposes a new bias evaluation metric – Gender-based Illicit Proximity Estimate (GIPE), which measures the extent of undue proximity in word vectors resulting from the presence of gender-based predilections. Experiments based on a suite of ...
Objectives:
Equality of Opportunity Difference (EOD) 1 related use case
We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group ...
Objectives:
Cross-lingual Natural Language Inference (XNLI) 1 related use case
The XNLI metric allows to evaluate a model’s score on the XNLI dataset, which is a subset of a few thousand examples from the MNLI dataset that have been translated into a 14 different languages, some of which are relatively low resource such as Swahili and...
Objectives:
Mean Average Precision (MAP) 1 related use case
Mean Average Precision (MAP) is a metric used to evaluate object detection models such as Fast R-CNN, YOLO, Mask R-CNN, etc. The mean of average precision(AP) values are calculated over recall values from 0 to 1.
Trustworthy AI Relevance
Th...
Objectives:
Statistical Parity Difference (SPD) 1 related use case
We study fairness in classification, where individuals are classified, e.g., admitted to a university, and the goal is to prevent discrimination against individuals based on their membership in some group, while maintaining utility for the classifier (the u...
Objectives:
Rank-Aware Divergence (RADio)
Objectives:
Predictions Groups Contrast (PGC)
The PGC metric compares the top-K ranking of features importance drawn from the entire dataset with the top-K ranking induced from specific subgroups of predictions. It can be applied to both categorical and regression problems, being useful for quantifying...
Objectives:
SAFE (Sustainable, Accurate, Fair and Explainable)
Machine learning models, at the core of AI applications, typically achieve a high accuracy at the expense of an insufficient explainability. Moreover, according to the proposed regulations, AI applications based on machine learning must be "trus...
Objectives:
Mean of Predicted Reciprocal Ranks (MRR)
Mean reciprocal rank (MRR) measures the number of triples predicted correctly. If the first predicted triple is correct, then 1 is added, if the second is correct, 1/2 is summed, and so on. MRR is generally used to quantify the effect of search algorithms.<...
Objectives:
Mean rank
Mean rank is the average ranking position of the items predicted by the model among all the possible items.
Trustworthy AI Relevance
This metric addresses Robustness and Fairness by quantifying relevant sys...
Objectives:
Cohen's Kappa coefficient
Cohen's kappa coefficient is a statistic that is used to measure inter-rater reliability (and also intra-rater reliability) for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, a...
Objectives:
Kendall rank correlation coefficient (KRCC)
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical de...
Objectives:
False Rejection Rate (FRR)
False rejection rate (FRR) is a security metric used to measure the performance of biometric systems such as voice recognition, fingerprint recognition, face recognition, or iris recognition. It represents the likelihood of a biometric system mistakenly rej...
Objectives:
False Acceptance Rate (FAR)
False acceptance rate (FAR) is a security metric used to measure the performance of biometric systems such as voice recognition, fingerprint recognition, face recognition, or iris recognition. It represents the likelihood of a biometric system mistakenly ac...
Objectives:
Matthews Correlation Coefficient
The Matthews correlation coefficient is used in machine learning as a measure of the quality of binary and multiclass classifications. It takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be ...
Objectives:
Pearson correlation coefficient (PCC)
In statistics, the Pearson correlation coefficient (PCC) ― also known as Pearson's r, the Pearson product-moment correlation coefficient (PPMCC), the bivariate correlation, or colloquially simply as the correlation coefficient ― is a measure of linear corre...
Objectives:
Cross-lingual TRansfer Evaluation of Multilingual Encoders for Speech (XTREME-S)
XTREME-S can indirectly support the Fairness objective by providing a means to evaluate whether multilingual speech models perform equitably across different languages. By highlighting disparities in model performance for underrepr...
Objectives:
Equal outcomes
In the field of health, equal patient outcomes refers to the assurance that protected groups have equal benefit in terms of patient outcomes from the deployment of machine-learning models. A weak form of equal outcomes is ensuring that both the protect...
Objectives:



























