CLIPSBERTScore

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Website

CLIPBERTSCORE is a simple weighted combination of CLIPScore (Hessel et al., 2021) and BERTScore (Zhang* et al., 2020) to leverage the robustness and strong factuality detection performance between image-summary and document-summary, respectively.

CLIPSBERTScore can support Explainability by providing a quantitative measure of how well AI-generated outputs (such as captions or retrievals) semantically align with reference data. This can help developers and users understand whether the system's outputs are meaningful and relevant, which is a component of making AI decisions more interpretable. However, the metric itself does not generate explanations, but rather supports the evaluation of output quality, which can be used as part of an explainability framework.

Trustworthy AI Relevance

This metric addresses Robustness and Transparency by quantifying relevant system properties. Robustness: CLIPSBERTScore quantifies how well a multimodal model's outputs align with expected text/image references across inputs and domains. Consistent high scores indicate reliable performance; measuring this score across distribution shifts, noise, or adversarial conditions helps evaluate resilience and stability.

References

1. Nikulin, M.S., 2001. Hellinger distance. In Encyclopedia of Mathematics. EMS Press. 2. Hellinger, E., 1909. Neue Begründung der Theorie quadratischer Formen von unendlichvielen Veränderlichen. Journal für die reine und angewandte Mathematik, 136, pp.210–271. 3. Jeffreys, H., 1946. An invariant form for the prior probability in estimation problems. Proceedings of the Royal Society of London. Series A. Mathematical and Physical Sciences, 186(1007), pp.453–461. 4. Liese, F. and Miescke, K.-J., 2008. Statistical Decision Theory: Estimation, Testing, and Selection. Springer. 5. Pardo, L., 2006. Statistical Inference Based on Divergence Measures. Chapman and Hall/CRC.

About the metric

You can click on the links to see the associated metrics

Objective(s):

Robustness
Transparency

Purpose(s):

Forecasting/prediction
Reasoning with knowledge structures/planning
Recognition/object detection

Lifecycle stage(s):

Build & interpret model

Target users:

Developer
System integrators

Risk management stage(s):

Assess
Govern

Modify this metric

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.