These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Scope
SUBMIT A METRIC
If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!
SUBMIT Time until Adversary’s Success 6 related use cases
The most general time-based metric measures the time until the adversary’s success. It assumes that the adversary will succeed eventually, and is therefore an example of a pessimistic metric. This metric relies on a definition of success, and varies depend...
Objectives:
Stability 1 related use case
Robustness Metrics provides lightweight modules in order to evaluate the robustness of classification models. Stability is defined as, e.g. the stability of the prediction and predicted probabilities under natural perturbation of the input.
The l...
Objectives:
Out-of-distribution (OOD) generalization 1 related use case
Robustness Metrics provides lightweight modules in order to evaluate the robustness of classification models. OOD generalization is defined as, e.g. a non-expert human would be able to classify similar objects, but possibly changed viewpoint, scene setting...
Objectives:
False Acceptance Rate (FAR)
False acceptance rate (FAR) is a security metric used to measure the performance of biometric systems such as voice recognition, fingerprint recognition, face recognition, or iris recognition. It represents the likelihood of a biometric system mistakenly ac...
Objectives:
False Rejection Rate (FRR)
False rejection rate (FRR) is a security metric used to measure the performance of biometric systems such as voice recognition, fingerprint recognition, face recognition, or iris recognition. It represents the likelihood of a biometric system mistakenly rej...
Objectives:
Structural Similarity Index (SSIM)
The structural similarity index measure (SSIM) measures the perceived similarity of two images. When one image is a modified version of the other (e.g., if it is compressed) the SSIM serves as a measure of the fidelity of the compressed representation. The ...
Objectives:
Multi-Object Tracking Accuracy (MOTA)
Multi-object tracking accuracy (MOTA) shows how many errors the tracker system has made in terms of misses, false positives, mismatch errors, etc. Therefore, it can be derived from three error ratios: the ratio of misses, the ratio of false positives, and t...
Objectives:
Kendall rank correlation coefficient (KRCC)
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical de...
Objectives:
Cohen's Kappa coefficient
Cohen's kappa coefficient is a statistic that is used to measure inter-rater reliability (and also intra-rater reliability) for qualitative (categorical) items. It is generally thought to be a more robust measure than simple percent agreement calculation, a...
Objectives:
Tree Edit Distance (TED)
Tree Edit Distance (TED) is a metric for calculation of similarity between syntactic n-grams for further detection of soft similarity between texts.
Objectives:
CLIPSBERTScore
Objectives:
Variable Importance Cloud (VIC)
Objectives:
Local Interpretable Model-agnostic Explanation (LIME)
Objectives:
SAFE Artificial Intelligence in finance
We propose a set of interrelated metrics, all based on the notion of AI output concentration, and the related Lorenz curve/Lorenz area under the curve, able to measure the Sustainability/robustness, Accuracy, Fairness/privacy, Explainability/accountability ...
Objectives: