Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

3 citations of this metric

JobFair is a robust framework designed to benchmark and evaluate hierarchical gender hiring biases in Large Language Models (LLMs) used for resume scoring. It identifies and quantifies two primary types of bias: Level Bias (differences in average outcomes) and Spread Bias (differences in outcome variability). Level Bias is further categorized into Taste-Based Bias (consistent bias regardless of resume length) and Statistical Bias (bias varying with resume length).

 

Applicable Models:

JobFair is applicable to any LLMs used in hiring or resume scoring, including GPT, Claude, Llama, and others. It can also be adapted for exploring other social traits and tasks beyond hiring.

 

Background:

Grounded in labor economics and legal principles, JobFair expands on traditional bias metrics by integrating both statistical and computational approaches. It builds on causal inference techniques from the Rubin Causal Model and introduces Ranking After Scoring (RAS) for enhanced bias evaluation. The framework complies with NYC Local Law 144.

 

Formulae:

1. Impact Ratio:

Impact Ratio for males = (Selection Rate of Male Group) / (Selection Rate of Most Selected Gender Group).

Selection rate is the proportion of individuals in a demographic group moving forward in the hiring process.

2. Permutation Test for Bias:

This statistical test assesses whether rank or variance differences between male and female groups are significant by comparing observed differences with 100,000 random permutations of the data.

3. Fixed-Effects Regression for Statistical Bias:

D_it = alpha_i + beta * log(I_it) + u_it,

where D_it is the score or rank gap for resume i in chunking round t, I_it is the number of words in the resume, alpha_i represents Taste-Based Bias, and beta represents Statistical Bias.

 

Applications:

1. Regulatory Compliance: Evaluating hiring tools for compliance with NYC Local Law 144 or similar legislation.

2. Bias Audits: Identifying and mitigating gender biases in hiring systems used by HR departments and third-party vendors.

3. Research & Development: Testing and improving LLM architectures to reduce systemic biases.

 

Impact:

JobFair highlights inherent biases in hiring processes driven by LLMs, offering organizations a scientific basis to ensure fairness and equity. By distinguishing between Taste-Based and Statistical Biases, it enables targeted interventions, improving trust in AI-driven hiring systems. Its insights inform policymakers and developers, fostering AI systems that align with ethical hiring practices.

References

catalogue Logos

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.