The CIDEr (Consensus-based Image Description Evaluation) metric is a way of evaluating the quality of generated textual descriptions of images. The CIDEr metric measures the similarity between a generated caption and the reference captions, and it is based on the concept of consensus: the idea that good captions should not only be similar to the reference captions in terms of word choice and grammar, but also in terms of meaning and content.

The CIDEr metric is computed as follows:

1. First, a set of reference captions is provided for each image. These captions serve as the ground truth for the evaluation.

2. The generated caption is compared to each reference caption using the BLEU (Bilingual Evaluation Understudy) score, which measures the n-gram overlap between the generated caption and the reference captions.

3. The BLEU scores are then modified using an IDF (Inverse Document Frequency) weighting, which gives more weight to words that are rare in the reference captions but appear in the generated caption.

4. Finally, the weighted BLEU scores are averaged over all reference captions to produce the final CIDEr score.

The CIDEr metric has become a standard in the field of image captioning and has been used in several benchmark datasets and competitions. It is a widely used evaluation metric because it provides a comprehensive evaluation of the quality of generated captions, taking into account both the language and content of the captions.

