These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
XTREME-S can indirectly support the Fairness objective by providing a means to evaluate whether multilingual speech models perform equitably across different languages. By highlighting disparities in model performance for underrepresented or minority languages, it can help developers identify and address potential biases, thus contributing to more equitable AI systems. However, this connection is indirect, as the metric itself does not measure fairness but rather provides data that can be used to assess it.The XTREME-S metric aims to evaluate model performance on the Cross-lingual TRansfer Evaluation of Multilingual Encoders for Speech (XTREME-S) benchmark.
This benchmark was designed to evaluate speech representations across languages, tasks, domains and data regimes. It covers 102 languages from 10+ language families, 3 different domains and 4 task families: speech recognition, translation, classification and retrieval.
Trustworthy AI Relevance
This metric addresses Robustness and Fairness by quantifying relevant system properties. Robustness: XTREME-S directly measures how well multilingual speech encoders maintain performance when transferring across languages and domains (i.e., distribution shift / OOD conditions). These cross-lingual transfer results quantify resilience to noise from language variation and low-resource conditions, which is central to robustness.
About the metric
You can click on the links to see the associated metrics
Objective(s):
Purpose(s):
Lifecycle stage(s):
Target users:



























