MAUVE

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Github

Website

MAUVE is a library built on PyTorch and HuggingFace Transformers to measure the gap between neural text and human text with the eponymous MAUVE measure. It summarizes both Type I and Type II errors measured softly using Kullback–Leibler (KL) divergences.

MAUVE can indirectly support the Robustness objective by quantifying how closely a generative model's outputs match the distribution of human data. Significant divergence detected by MAUVE may indicate that the model is producing outputs that are unreliable or out-of-distribution, which could signal a lack of robustness. However, this connection is indirect, as MAUVE does not explicitly measure system resilience under adverse conditions or operational reliability.

Trustworthy AI Relevance

This metric addresses Robustness and Transparency by quantifying relevant system properties. Robustness: MAUVE directly evaluates how closely a generative model's output distribution matches human text distribution, so it is useful for detecting distributional drift, mode collapse, reduced diversity, or other failure modes under changing conditions — all aspects of robustness (consistency and resilience of model outputs). Transparency: By summarizing the quality/diversity tradeoff and exposing distributional gaps, MAUVE helps practitioners and stakeholders understand model behavior and differences from human language, supporting clearer reporting and diagnostic insight into model generation characteristics.

About the metric

You can click on the links to see the associated metrics

Objective(s):

Robustness
Transparency

Purpose(s):

Forecasting/prediction

Lifecycle stage(s):

Build & interpret model

Target users:

Developer

Risk management stage(s):

Define

Modify this metric

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.