These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
Metric for Evaluation of Translation with Explicit ORdering (METEOR) is a machine translation evaluation metric, which is calculated based on the harmonic mean of precision and recall, with recall weighted more than precision.
METEOR is based on a generalized concept of unigram matching between the machine-produced translation and human-produced reference translations. Unigrams can be matched based on their surface forms, stemmed forms, and meanings. Once all generalized unigram matches between the two strings have been found, METEOR computes a score for this matching using a combination of unigram-precision, unigram-recall, and a measure of fragmentation that is designed to directly capture how well-ordered the matched words in the machine translation are in relation to the reference.
Trustworthy AI Relevance
This metric addresses Robustness and Explainability by quantifying relevant system properties. Robustness: METEOR provides a quantitative, repeatable measure of translation/output quality and consistency. Monitoring METEOR across datasets, noise conditions, or distribution shifts helps detect model degradation, compare resilience across variants, and support OOD/performance regression analysis — all central to robustness.
Related use cases :
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Uploaded on Mar 15, 2024About the metric
You can click on the links to see the associated metrics
Objective(s):
Purpose(s):
Target sector(s):
Lifecycle stage(s):
Target users:


























