Catalogue of Tools & Metrics for Trustworthy AI
These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
SUBMIT A METRIC
If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!
SUBMIT Normalized Detection Score (NDS) 1 related use case
Normalized Detection Score (NDS) evaluates the performance of 3D object detection systems.
The NDS metric is calculated as the average of the detection scores over different distance ranges and object sizes. Specifically, for each detected object, the...
Objectives:
Fréchet Inception Distance (FID) 1 related use case
The Fréchet inception distance (FID) is a metric used to assess the quality of images created by a generative model, like a generative adversarial network (GAN). Unlike the earlier inception score (IS), which evaluates only the distribution of generated ima...
Objectives:
Adjusted Rand Index (ARI) 2 related use cases
The Adjusted Rand Index (ARI) is a measure of the similarity between two data clusterings. It is a correction of the Rand Index, which is a basic measure of similarity between two clusterings, but it has the disadvantage of being sensitive to chance. The Ad...
Anonymity Set Size 1 related use case
The anonymity set for an individual u, denoted ASu is the set of users that the adversary cannot distinguish from u. It can be seen as the size of the crowd into which the target u can blend.
privASS ≡ |ASu |
Objectives:
Amount of Leaked Information 1 related use case
This metric counts the information items S disclosed by a system, e.g., the number of compromised users. However, this metric does not indicate the severity of a leak because it does not account for the
sensitivity of the leaked information.
<...
Objectives:
Time until Adversary’s Success 1 related use case
The most general time-based metric measures the time until the adversary’s success. It assumes that the adversary will succeed eventually, and is therefore an example of a pessimistic metric. This metric relies on a definition of success, and varies depend...
Objectives:
Statistical Parity Difference (SPD) 1 related use case
We study fairness in classification, where individuals are classified, e.g., admitted to a university, and the goal is to prevent discrimination against individuals based on their membership in some group, while maintaining utility for the classifier (the ...
Objectives:
Conditional Entropy 1 related use case
We discuss information-theoretic anonymity metrics, that use entropy over the distribution of all possible recipients to quantify anonymity. We identify a common misconception: the entropy of the distribution describing the potential receivers does not alw...
Objectives:
Stability 1 related use case
Robustness Metrics provides lightweight modules in order to evaluate the robustness of classification models. Stability is defined as, e.g. the stability of the prediction and predicted probabilities under natural perturbation of the input.
The l...
Objectives:
Out-of-distribution (OOD) generalization 1 related use case
Robustness Metrics provides lightweight modules in order to evaluate the robustness of classification models. OOD generalization is defined as, e.g. a non-expert human would be able to classify similar objects, but possibly changed viewpoint, scene setting...
Objectives:
Equality of Opportunity Difference (EOD) 1 related use case
We propose a criterion for discrimination against a specified sensitive attribute in supervised learning, where the goal is to predict some target based on available features. Assuming data about the predictor, target, and membership in the protected group...
Objectives:
Equal performance 1 related use case
If a model systematically makes errors disproportionately for patients in the protected group, it is likely to lead to unequal outcomes. Equal performance refers to the assurance that a model is equally accurate for patients in the protec...
Objectives:
Conditional Demographic Disparity (CDD)
The demographic disparity metric (DD) determines whether a facet has a larger proportion of the rejected outcomes in the dataset than of the accepted outcomes. In the binary case where there are two facets, men and women for example, that constitute the dat...
Objectives:
Rank-Aware Divergence (RADio)
Objectives:
Contextual Outlier Interpretation (COIN)
Objectives:
Local Explanation Method using Nonlinear Approximation (LEMNA)
Objectives:
Shapley Additive Explanation (SHAP)
Objectives:
Local Interpretable Model-agnostic Explanation (LIME)
Objectives:
Shapley Variable Importance Cloud (ShapleyVIC)
Objectives:
