Bilingual Evaluation Understudy (BLEU)

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Github

Website

Bilingual Evaluation Understudy (BLEU) is an algorithm for evaluating the quality of text which has been machine-translated from one natural language to another. Quality is considered to be the correspondence between a machine’s output and that of a human: “the closer a machine translation is to a professional human translation, the better it is” – this is the central idea behind BLEU. BLEU was one of the first metrics to claim a high correlation with human judgements of quality, and remains one of the most popular automated and inexpensive metrics.

Scores are calculated for individual translated segments—generally sentences—by comparing them with a set of good quality reference translations. Those scores are then averaged over the whole corpus to reach an estimate of the translation’s overall quality. Neither intelligibility nor grammatical correctness are not taken into account.

Related use cases :

Better than Average: Paired Evaluation of NLP Systems

Uploaded on Oct 21, 2022

Evaluation in NLP is usually done by comparing the scores of competing systems independently averaged over a common set of test instances. In this work, we question the use of ...

COMET: A Neural Framework for MT Evaluation

Uploaded on Oct 21, 2022

We present COMET, a neural framework for training multilingual machine translation evaluation models which obtains new state-of-the-art levels of correlation with human judgeme...

BLEU, METEOR, BERTScore: Evaluation of Metrics Performance in Assessing Critical Translation Errors in Sentiment-oriented Text

Uploaded on Oct 21, 2022

Social media companies as well as authorities make extensive use of artificial intelligence (A...

Semantic-Based Self-Critical Training For Question Generation

Uploaded on Nov 1, 2022

Question generation is a conditioned language generation task that consists in generating a context-aware question given a context and the targeted answer. Train language model...

BioT5: Enriching Cross-modal Integration in Biology with Chemical Knowledge and Natural Language Associations

Uploaded on Nov 1, 2023

In this paper, we propose a novel benchmark called the StarCraft Multi-Agent Challenges+, where agents learn to perform multi-stage tasks and to use environmental factors without p...

Controlling Hallucinations at Word Level in Data-to-Text Generation

Uploaded on Nov 1, 2023

Although the estimation of 3D human pose and shape (HPS) is rapidly progressing, current methods still cannot reliably estimate moving humans in global coordinates, which is critic...

ERNIE-GEN: An Enhanced Multi-Flow Pre-training and Fine-tuning Framework for Natural Language Generation

Uploaded on Nov 1, 2023

Current pre-training works in natural language generation pay little attention to the problem of exposure bias on downstream tasks. To address this issue, we propose an enhanced mu...

Improving Sign Language Translation with Monolingual Data by Sign Back-Translation

Uploaded on Nov 1, 2023

This work investigates a simple yet powerful dense prediction task adapter for Vision Transformer (ViT). Unlike recently advanced variants that incorporate vision-specific inductiv...

Machine Translation Pre-training for Data-to-Text Generation -- A Case Study in Czech

Uploaded on Nov 1, 2023

Programming is a powerful and ubiquitous problem-solving tool. Developing systems that can assist programmers or even generate programs independently could make programming more pr...

May the Force Be with Your Copy Mechanism: Enhanced Supervised-Copy Method for Natural Language Generation

Uploaded on Nov 1, 2023

Driving 3D characters to dance following a piece of music is highly challenging due to the spatial constraints applied to poses by choreography norms. In addition, the generated da...

Multimodal Pretraining for Dense Video Captioning

Uploaded on Nov 1, 2023

Traffic forecasting as a canonical task of multivariate time series forecasting has been a significant research topic in AI community. To address the spatio-temporal heterogeneity ...

NITS-VC System for VATEX Video Captioning Challenge 2020

Uploaded on Nov 1, 2023

Video captioning is process of summarising the content, event and action of the video into a short textual form which can be helpful in many research areas such as video guided mac...

NeurST: Neural Speech Translation Toolkit

Uploaded on Nov 1, 2023

NeurST is an open-source toolkit for neural speech translation. The toolkit mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced sp...

RefineCap: Concept-Aware Refinement for Image Captioning

Uploaded on Nov 1, 2023

We present Video-LLaMA a multi-modal framework that empowers Large Language Models (LLMs) with the capability of understanding both visual and auditory content in the video. Video-...

Scaling Up Vision-Language Pre-training for Image Captioning

Uploaded on Nov 1, 2023

The recent advances in neural language models have also been successfully applied to the field of chemistry, offering generative solutions for classical problems in molecular desig...

About the metric

You can click on the links to see the associated metrics

Objective(s):

Robustness
Explainability

Purpose(s):

Forecasting/prediction
Reasoning with knowledge structures/planning

Target sector(s):

Agriculture
Science & technology
Innovation
Environment
Education
Corporate governance

Lifecycle stage(s):

Operate & monitor
Verify & validate

Target users:

Data scientist
Developer
Project manager
System operators

Risk management stage(s):

Define
Assess
Treat

Modify this metric

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.