Cross-lingual TRansfer Evaluation of Multilingual Encoders for Speech (XTREME-S)

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Website

XTREME-S can indirectly support the Fairness objective by providing a means to evaluate whether multilingual speech models perform equitably across different languages. By highlighting disparities in model performance for underrepresented or minority languages, it can help developers identify and address potential biases, thus contributing to more equitable AI systems. However, this connection is indirect, as the metric itself does not measure fairness but rather provides data that can be used to assess it.The XTREME-S metric aims to evaluate model performance on the Cross-lingual TRansfer Evaluation of Multilingual Encoders for Speech (XTREME-S) benchmark.

This benchmark was designed to evaluate speech representations across languages, tasks, domains and data regimes. It covers 102 languages from 10+ language families, 3 different domains and 4 task families: speech recognition, translation, classification and retrieval.

Trustworthy AI Relevance

This metric addresses Robustness and Fairness by quantifying relevant system properties. Robustness: XTREME-S directly measures how well multilingual speech encoders maintain performance when transferring across languages and domains (i.e., distribution shift / OOD conditions). These cross-lingual transfer results quantify resilience to noise from language variation and low-resource conditions, which is central to robustness.

About the metric

You can click on the links to see the associated metrics

Objective(s):

Fairness
Robustness

Purpose(s):

Event/anomaly detection
Forecasting/prediction
Goal-driven optimisation
Interaction support/chatbots
Personalisation/recommenders
Reasoning with knowledge structures/planning
Recognition/object detection
Content generation

Lifecycle stage(s):

Operate & monitor
Verify & validate

Target users:

Data scientist
Developer
Project manager
System operators

Risk management stage(s):

Define
Assess

Modify this metric

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.