Google BLEU (GLEU)

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Github

Website

The BLEU score has some undesirable properties when used for single sentences, as it was designed to be a corpus measure. The Google BLEU score, also known as GLEU score, is designed to limit these undesirable properties when used for single sentences.

To calculate this score, all sub-sequences of 1, 2, 3 or 4 tokens in output and target sequence (n-grams) are recorded. The precision and recall, described below, are then computed.

precision: the ratio of the number of matching n-grams to the number of total n-grams in the generated output sequence
recall: the ratio of the number of matching n-grams to the number of total n-grams in the target (ground truth) sequence

The minimum value of precision and recall is then returned as the score.

GLEU can indirectly bolster the Robustness objective by providing a quantifiable signal of how well a translation model’s outputs align with human references. Persistent or domain-specific drops in GLEU reveal quality drift or out-of-distribution behaviour that may undermine system reliability, prompting retraining or fallback mechanisms before errors propagate to users. Nevertheless, this link is indirect: GLEU captures surface-level linguistic fidelity and does not test the model’s resilience under adversarial inputs, extreme noise, or operational faults.

About the metric

You can click on the links to see the associated metrics

Objective(s):

Robustness

Purpose(s):

Forecasting/prediction
Recognition/object detection

Lifecycle stage(s):

Verify & validate

Target users:

Developer
Project manager
System integrators

Risk management stage(s):

Govern

Modify this metric

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.