These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
TrueTeacher is a model-based metric designed to evaluate the factual consistency of generated summaries by comparing them against the original text. It utilizes a T5-11B model fine-tuned on a synthetic dataset specifically curated for consistency evaluation. The evaluation process involves generating summaries using diverse summarization models, which are then annotated by a large language model (LLM) to label each summary as factually consistent or not. This approach enables TrueTeacher to assess whether a summary accurately reflects the information present in the source document.
Formula:
TrueTeacher employs a scoring mechanism based on binary classification, where:
• 1 indicates factual consistency.
• 0 indicates inconsistency.
The entailment probability P for a summary being factually correct is computed as the model’s softmax output for consistency.
Example Usage:
TrueTeacher can be employed to evaluate the factual accuracy of summaries in automatic summarization applications, improving reliability in content generation across news, educational, and research domains. It is particularly useful in high-stakes scenarios requiring verified factual information.
Application and Impact:
TrueTeacher addresses the challenge of hallucination in LLM-generated content, making it valuable for generating trustworthy summaries, enhancing automated journalism, and improving research content dissemination. Its large-scale, diverse training data ensures robustness across domains and adaptability in multilingual contexts, with applications in content moderation, fact-checking, and compliance auditing in AI-driven content creation.
Trustworthy AI Relevance
This metric addresses Explainability and Human Agency & Control by quantifying relevant system properties. TrueTeacher appears to quantify pedagogical quality: how clearly an AI explains, scaffolds learning, and supports a student’s understanding. That directly maps to Explainability because the metric assesses whether the system provides understandable, pedagogically effective explanations and reasoning (reducing user confusion and increasing transparency of instructional content).
References
Gekhman, Z., Herzig, J., Aharoni, R., Elkind, C., & Szpektor, I. (2023). TrueTeacher: Learning Factual Consistency Evaluation with Large Language Models. arXiv preprint arXiv:2305.11171.
About the metric
You can click on the links to see the associated metrics
Metric type(s):
Objective(s):
Purpose(s):
Target sector(s):
Lifecycle stage(s):
Usage rights:
Target users:
Risk management stage(s):
Github stars:
- 34000
Github forks:
- 7900



























