Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

SAGED is a pioneering pipeline for bias detection in large language models (LLMs). It introduces an integrated framework for building benchmarks, running diagnostic tests, and calibrating fairness baselines, addressing key challenges such as contamination, limited scope, and tool bias. SAGED applies advanced techniques, including counterfactual branching and statistical disparity metrics, to assess and diagnose biases in LLM outputs.

 

Applicable Models:

SAGED is designed for text-based LLMs of all scales, particularly those used in diverse domains such as natural language processing, sentiment analysis, and role-based simulations. It supports models like GPT-4, Llama3.1, and Mistral.

 

Background:

Bias in LLMs can lead to discriminatory outputs, perpetuating systemic injustices. Existing benchmarks often lack contextual fairness baselines and suffer from contamination, limiting their reliability. SAGED addresses these gaps by providing a holistic and customizable pipeline, integrating five core modules: scraping, assembling, generating, extracting, and diagnosing. Its unique features include customizable baselines, counterfactual branching, and robust statistical tools.

 

Formulae:

Impact Ratio (IR): Minimum selection rate divided by maximum selection rate across groups. IR < 0.8 indicates significant bias (Four-fifths rule).

Max Z-Score: Maximum deviation of group statistics from the mean, used to assess bias concentration.

Disparity Metrics: Functions such as Range (Max - Min) or Min-Max Ratio (Min/Max) applied to summary statistics across groups.

 

Applications:

National Bias Analysis: Evaluates sentiment disparities in LLM outputs about countries.

Role-based Bias Analysis: Assesses performance shifts when LLMs simulate roles such as political figures.

Customizable Benchmarks: Generates tailored benchmarks for specific contexts using scraping and branching modules.

 

Impact:

SAGED empowers researchers and developers to identify, analyze, and mitigate biases in LLMs, fostering transparency and fairness. Its modular approach ensures flexibility, enabling insights into nuanced bias patterns like sentiment disparities and role-based performance shifts.

References

Guan, Xin, Nathaniel Demchak, Saloni Gupta, Ze Wang, Ediz Ertekin Jr, Adriano Koshiyama, Emre Kazim, and Zekun Wu. "SAGED: A Holistic Bias-Benchmarking Pipeline for Language Models with Customisable Fairness Calibration." arXiv preprint arXiv:2409.11149 (2024).

About the metric


Metric type(s):





Lifecycle stage(s):




Risk management stage(s):


Github stars:

  • 2

Modify this metric

catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.