Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

The CIDEr (Consensus-based Image Description Evaluation) metric is a way of evaluating the quality of generated textual descriptions of images. The CIDEr metric measures the similarity between a generated caption and the reference captions, and it is based on the concept of consensus: the idea that good captions should not only be similar to the reference captions in terms of word choice and grammar, but also in terms of meaning and content.

The CIDEr metric is computed as follows:

1. First, a set of reference captions is provided for each image. These captions serve as the ground truth for the evaluation.

2. The generated caption is compared to each reference caption using the BLEU (Bilingual Evaluation Understudy) score, which measures the n-gram overlap between the generated caption and the reference captions.

3. The BLEU scores are then modified using an IDF (Inverse Document Frequency) weighting, which gives more weight to words that are rare in the reference captions but appear in the generated caption.

4. Finally, the weighted BLEU scores are averaged over all reference captions to produce the final CIDEr score.

The CIDEr metric has become a standard in the field of image captioning and has been used in several benchmark datasets and competitions. It is a widely used evaluation metric because it provides a comprehensive evaluation of the quality of generated captions, taking into account both the language and content of the captions.

Related use cases :

Uploaded on Mar 15, 2024
Most recent scribble-supervised segmentation methods commonly adopt a CNN framework with an encoder-decoder architecture. Despite its multiple benefits, this framework generally ca...

Uploaded on Mar 15, 2024
The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is...

Uploaded on Mar 15, 2024
With the proposal of the Segment Anything Model (SAM), fine-tuning SAM for medical image segmentation (MIS) has become popular. However, due to the large size of the SAM model and ...


About the metric


Objective(s):




Lifecycle stage(s):


Target users:


Risk management stage(s):

Modify this metric

catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.