Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

CHAIR is a metric designed to measure object hallucination in image captioning models, assessing the relevance of generated captions to the actual image content. It evaluates how often models “hallucinate” objects not present in the image and introduces a new way to measure caption quality using veridical visual labels.

 

Applicable Models

 

Image captioning models including attention-based models (e.g., TopDown, NBT) and non-attention-based models (e.g., FC, LRCN). It applies to models trained and evaluated on datasets such as MSCOCO.

 

Background

 

Standard sentence metrics like CIDEr, METEOR, and SPICE fail to penalize hallucinated objects sufficiently, making them less reliable for assessing image relevance. CHAIR addresses this gap by using both ground truth object annotations and captions.

 

Formulae

CHAIRi = (Number of hallucinated objects) / (Total objects mentioned in captions)

CHAIRs = (Number of sentences with hallucinated objects) / (Total sentences)

 

Applications

 

CHAIR is used to evaluate hallucination tendencies in image captioning models, aiding in model comparison and optimization for tasks requiring high image-caption fidelity. It helps identify how models rely on language priors versus visual input.

 

Impact

 

CHAIR promotes responsible AI development by encouraging models to produce captions more aligned with image content, enhancing performance and reducing misleading outputs. It helps mitigate risks associated with hallucinations, especially in sensitive applications for visually impaired users or automated systems.

References

Rohrbach, A., Hendricks, L. A., Burns, K., Darrell, T., & Saenko, K. (2018). Object hallucination in image captioning. arXiv preprint arXiv:1809.02156.

About the metric


Metric type(s):





Lifecycle stage(s):




Risk management stage(s):


Github stars:

  • 63

Github forks:

  • 8

Modify this metric

catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.