Caption Hallucination Assessment with Image Relevance (CHAIR)

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

822 citations of this metric

Github

Website

CHAIR is a metric designed to measure object hallucination in image captioning models, assessing the relevance of generated captions to the actual image content. It evaluates how often models “hallucinate” objects not present in the image and introduces a new way to measure caption quality using veridical visual labels.

Applicable Models

Image captioning models including attention-based models (e.g., TopDown, NBT) and non-attention-based models (e.g., FC, LRCN). It applies to models trained and evaluated on datasets such as MSCOCO.

Background

Standard sentence metrics like CIDEr, METEOR, and SPICE fail to penalize hallucinated objects sufficiently, making them less reliable for assessing image relevance. CHAIR addresses this gap by using both ground truth object annotations and captions.

Formulae

• CHAIRi = (Number of hallucinated objects) / (Total objects mentioned in captions)

• CHAIRs = (Number of sentences with hallucinated objects) / (Total sentences)

Applications

CHAIR is used to evaluate hallucination tendencies in image captioning models, aiding in model comparison and optimization for tasks requiring high image-caption fidelity. It helps identify how models rely on language priors versus visual input.

Impact

CHAIR promotes responsible AI development by encouraging models to produce captions more aligned with image content, enhancing performance and reducing misleading outputs. It helps mitigate risks associated with hallucinations, especially in sensitive applications for visually impaired users or automated systems.

References

Rohrbach, A., Hendricks, L. A., Burns, K., Darrell, T., & Saenko, K. (2018). Object hallucination in image captioning. arXiv preprint arXiv:1809.02156.

About the metric

You can click on the links to see the associated metrics

Metric type(s):

Technical

Objective(s):

Robustness
Safety

Purpose(s):

Recognition/object detection

Target sector(s):

Public governance
Innovation
Health
Digital Economy
Transport

Lifecycle stage(s):

Verify & validate

Usage rights:

Research purposes only

Target users:

Data scientist
Developer
Researcher

Risk management stage(s):

Assess risks & impacts

Github stars:

Github forks:

Modify this metric

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.