Faithfulness

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Github

Website

Faithfulness is a metric that assesses the factual consistency of the model’s generated response with respect to the provided context. This metric ensures that every claim made in the answer can be supported or inferred from the context. The score ranges from 0 to 1, with higher values indicating better factual alignment. Faithfulness is especially crucial in applications where the accuracy of information is vital, as it prevents the model from producing unsupported or “hallucinated” statements.

Formula:

Faithfulness = (Number of Claims in the Generated Answer Supported by the Given Context) / (Total Number of Claims in the Generated Answer)

This formula calculates the proportion of claims in the generated answer that are factually consistent with the given context.

Types of Faithfulness Approaches:

1. Basic Faithfulness: Uses a simple cross-checking method where claims in the response are validated directly against the retrieved context.

2. Faithfulness with HHEM-2.1-Open: Utilizes Vectara’s HHEM-2.1-Open model, a T5-based classifier, to detect hallucinations in generated text. This model assists in identifying unsupported claims, enhancing the reliability of faithfulness assessments.

References

Ragas Documentation: Faithfulness

About the metric

You can click on the links to see the associated metrics

Metric type(s):

Technical

Objective(s):

Safety
Explainability

Purpose(s):

Interaction support/chatbots
Content generation

Target sector(s):

Other
Education
Economy

Lifecycle stage(s):

Operate & monitor
Verify & validate
Build & interpret model

Usage rights:

Open source/Permissive

Target users:

Data scientist
Developer
Researcher

Risk management stage(s):

Assess risks & impacts
Assess

Github stars:

7100

Github forks:

Modify this metric

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.