Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

The XNLI metric allows to evaluate a model’s score on the XNLI dataset, which is a subset of a few thousand examples from the MNLI dataset that have been translated into a 14 different languages, some of which are relatively low resource such as Swahili and Urdu.

As with MNLI, the task is to predict textual entailment (does sentence A imply/contradict/neither sentence B) and is a classification task (given two sentences, predict one of three labels).

 

XNLI supports Fairness by enabling the assessment of whether an AI system performs equitably across different languages, helping to identify and mitigate language-based biases that could lead to discriminatory outcomes. It also supports Robustness by testing the system's ability to maintain inference performance under the ‘adverse condition’ of language variation, thus evaluating resilience and reliability in multilingual scenarios.

Related use cases :

Uploaded on Nov 1, 2022

We present a simple yet effective Targeted Adversarial Training (TAT) algorithm to improve adversarial training for natural language understanding. The key idea is to introspec...



About the metric






Risk management stage(s):

Modify this metric

catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.