These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Scope
SUBMIT A METRIC
If you have a tool that you think should be featured in the Catalogue of AI Tools & Metrics, we would love to hear from you!
SUBMIT Mean Per Joint Position Error (MPJPE) 2 related use cases
Mean Per Joint Position Error (MPJPE) is a common metric used to evaluate the performance of human pose estimation algorithms. It measures the average distance between the predicted joints of a human skeleton and the ground truth joints in a given dataset. ...
Objectives:
False Acceptance Rate (FAR)
False acceptance rate (FAR) is a security metric used to measure the performance of biometric systems such as voice recognition, fingerprint recognition, face recognition, or iris recognition. It represents the likelihood of a biometric system mistakenly ac...
Objectives:
False Rejection Rate (FRR)
False rejection rate (FRR) is a security metric used to measure the performance of biometric systems such as voice recognition, fingerprint recognition, face recognition, or iris recognition. It represents the likelihood of a biometric system mistakenly rej...
Objectives:
Faithfulness
Faithfulness is a metric that assesses the factual consistency of the model’s generated response with respect to the provided context. This metric ensures that every claim made in the answer can be supported or inferred from the context. The score ranges fr...
Objectives:
Topic Adherence
Topic Adherence evaluates an AI system’s ability to confine its responses to predefined subject areas during interactions. This metric is crucial in applications where the AI is expected to assist only within specific domains, ensuring that responses remain...
Objectives:
Aspect Critic
Aspect Critic is an evaluation metric used to assess responses based on predefined criteria, called “aspects,” written in natural language. This metric produces a binary output—either ‘Yes’ (1) or ‘No’ (0)—indicating whether the response meets the specified...
Objectives:
HaRiM+
HaRiM+ is a reference-free evaluation metric that assesses the quality of generated summaries by estimating the hallucination risk within the summarization process. It uses a modified summarization model to measure how closely generated summaries align with...
Objectives:
Attack Success Rate (ASR)
The Attack Success Rate (ASR) measures the effectiveness of adversarial attacks against machine learning models. It is calculated as the percentage of attacks that successfully cause a model to misclassify or generate incorrect outputs. Thi...
Objectives:
Caption Hallucination Assessment with Image Relevance (CHAIR)
CHAIR is a metric designed to measure object hallucination in image captioning models, assessing the relevance of generated captions to the actual image content. It evaluates how often models “hallucinate” objects not present in the image and introduces a n...
Objectives:
Hughes Hallucination Evaluation Model (HHEM) Score
The Hughes Hallucination Evaluation Model (HHEM) Score is a metric designed to detect hallucinations in text generated by AI systems. It outputs a probability score between 0 and 1, where 0 indicates hallucination and 1 indicates factual consistency. The me...
Objectives:
Reject Rate (RR)
The Reject Rate is a metric used to evaluate the frequency at which a large language model (LLM) refuses to provide a response to a query. It is particularly relevant in scenarios where refusal is expected to mitigate risks associated with unsafe, biased, o...
Objectives:
