Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

SARI (system output against references and against the input sentence) is a metric used for evaluating automatic text simplification systems.

The metric compares the predicted simplified sentences against the reference and the source sentences. It explicitly measures the goodness of words that are added, deleted and kept by the system.

SARI can be computed as:

sari = ( F1_add + F1_keep + P_del) / 3

where

F1_add is the n-gram F1 score for add operations

F1_keep is the n-gram F1 score for keep operations

P_del is the n-gram precision score for delete operations

The number of n grams, n, is equal to 4, as in the original paper.

This implementation is adapted from Tensorflow’s tensor2tensor implementation. It has two differences with the original GitHub implementation:

It defines 0/0=1 instead of 0 to give higher scores for predictions that match a target exactly.
It fixes an alleged bug in the keep score computation.

 

SARI supports Explainability by encouraging AI systems to produce outputs that are easier to understand. Simplified text can make AI decisions and outputs more accessible and comprehensible to a broader audience, thereby mitigating confusion and improving clarity. However, SARI does not directly measure or enforce explainability mechanisms; its contribution is indirect, through the promotion of simpler, clearer outputs.

Trustworthy AI Relevance

This metric addresses Transparency, Explainability by quantifying relevant system properties. SARI supports Transparency by enabling clear evaluation of how AI systems transform input text into simplified outputs, thus revealing the system's operational behavior in a quantifiable manner. It supports Explainability by measuring how well the system's output aligns with human references and the original input, which helps users and developers understand the rationale and quality of simplification decisions made by the AI.

About the metric






Risk management stage(s):

Modify this metric

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.