These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Scope
SUBMIT A METRIC
If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!
SUBMIT Anonymity Set Size 25 related use cases
The anonymity set for an individual u, denoted ASu is the set of users that the adversary cannot distinguish from u. It can be seen as the size of the crowd into which the target u can blend.
privASS ≡ |ASu |
Objectives:
Time until Adversary’s Success 12 related use cases
The most general time-based metric measures the time until the adversary’s success. It assumes that the adversary will succeed eventually, and is therefore an example of a pessimistic metric. This metric relies on a definition of success, and varies depend...
Objectives:
Amount of Leaked Information 12 related use cases
This metric counts the information items S disclosed by a system, e.g., the number of compromised users. However, this metric does not indicate the severity of a leak because it does not account for the
sensitivity of the leaked information.
<...
Objectives:
Conditional Entropy 1 related use case
We discuss information-theoretic anonymity metrics, that use entropy over the distribution of all possible recipients to quantify anonymity. We identify a common misconception: the entropy of the distribution describing the potential receivers does not alw...
Objectives:
False Acceptance Rate (FAR)
False acceptance rate (FAR) is a security metric used to measure the performance of biometric systems such as voice recognition, fingerprint recognition, face recognition, or iris recognition. It represents the likelihood of a biometric system mistakenly ac...
Objectives:
False Rejection Rate (FRR)
False rejection rate (FRR) is a security metric used to measure the performance of biometric systems such as voice recognition, fingerprint recognition, face recognition, or iris recognition. It represents the likelihood of a biometric system mistakenly rej...
Objectives:
Structural Similarity Index (SSIM)
The structural similarity index measure (SSIM) measures the perceived similarity of two images. When one image is a modified version of the other (e.g., if it is compressed) the SSIM serves as a measure of the fidelity of the compressed representation. The ...
Objectives:
Fréchet Inception Distance (FID)
The Fréchet inception distance (FID) typically measures the quality of image generative models. More specifically, FID is a semimetric commonly applied to generative models based on generative adversarial networks (GANs), which was among the first generativ...
Objectives:
Learned Perceptual Image Patch Similarity (LPIPS)
The learned perceptual image patch similarity (LPIPS) is used to judge the perceptual similarity between two images. LPIPS is computed with a model that is trained on a labeled dataset of human-judged perceptual similarity. The perception-measuring model co...
Objectives:
Kendall rank correlation coefficient (KRCC)
In statistics, the Kendall rank correlation coefficient, commonly referred to as Kendall's τ coefficient, is a statistic used to measure the ordinal association between two measured quantities. A τ test is a non-parametric hypothesis test for statistical de...
Objectives:
Contextual Outlier Interpretation (COIN)
Objectives:
Tool call Accuracy
Tool Call Accuracy evaluates the effectiveness of a language model (LLM) in accurately identifying and invoking the necessary tools to accomplish a specified task. This metric is essential for assessing the model’s capability to select and utilize appropria...
Objectives: