Factual Correctness

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

4 citations of this metric

Github

Website

Factual Correctness is a metric designed to rigorously evaluate the alignment of a generated response with an authoritative reference, ensuring the factual integrity of AI outputs. It measures factual accuracy, capturing the extent to which the model’s response reflects verifiable claims presented in the reference material. This metric is essential for applications requiring high reliability and accuracy, particularly in domains where misinformation or unsupported statements could have significant consequences. By quantifying factual alignment, Factual Correctness facilitates a detailed examination of the model’s adherence to reference information, supporting improved response quality in AI systems.

To compute Factual Correctness, the process involves:

1. Claim Decomposition: Segmenting both the generated response and the reference text into discrete, assessable claims.

2. Natural Language Inference (NLI): Employing NLI to verify the factual overlap between corresponding claims in the response and reference.

3. Metric Calculation: Using the identified True Positives (TP), False Positives (FP), and False Negatives (FN) to calculate precision, recall, and F1 score.

Precision, recall, and F1 scores provide a comprehensive assessment of factual overlap and are computed as follows:

• Precision: TP / (TP + FP) — Indicates the proportion of claims in the response that are factually supported by the reference.

• Recall: TP / (TP + FN) — Measures the completeness of the factual content relative to the reference.

• F1 Score: 2 × (Precision × Recall) / (Precision + Recall) — Balances precision and recall to reflect the overall factual alignment.

References

Factual Correctness - Ragas Documentation

About the metric

You can click on the links to see the associated metrics

Metric type(s):

Technical

Objective(s):

Robustness

Purpose(s):

Interaction support/chatbots
Content generation

Target sector(s):

Other
Economy

Lifecycle stage(s):

Operate & monitor
Verify & validate
Build & interpret model

Usage rights:

Open source/Permissive

Target users:

Data scientist
Developer
Researcher

Risk management stage(s):

Assess risks & impacts
Assess

Github stars:

7100

Github forks:

Modify this metric

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.