JobFair (A Framework for Benchmarking Gender Hiring Bias in Large Language Models)

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

3 citations of this metric

Website

JobFair is a robust framework designed to benchmark and evaluate hierarchical gender hiring biases in Large Language Models (LLMs) used for resume scoring. It identifies and quantifies two primary types of bias: Level Bias (differences in average outcomes) and Spread Bias (differences in outcome variability). Level Bias is further categorized into Taste-Based Bias (consistent bias regardless of resume length) and Statistical Bias (bias varying with resume length).

Applicable Models:

JobFair is applicable to any LLMs used in hiring or resume scoring, including GPT, Claude, Llama, and others. It can also be adapted for exploring other social traits and tasks beyond hiring.

Background:

Grounded in labor economics and legal principles, JobFair expands on traditional bias metrics by integrating both statistical and computational approaches. It builds on causal inference techniques from the Rubin Causal Model and introduces Ranking After Scoring (RAS) for enhanced bias evaluation. The framework complies with NYC Local Law 144.

Formulae:

1. Impact Ratio:

Impact Ratio for males = (Selection Rate of Male Group) / (Selection Rate of Most Selected Gender Group).

Selection rate is the proportion of individuals in a demographic group moving forward in the hiring process.

2. Permutation Test for Bias:

This statistical test assesses whether rank or variance differences between male and female groups are significant by comparing observed differences with 100,000 random permutations of the data.

3. Fixed-Effects Regression for Statistical Bias:

D_it = alpha_i + beta * log(I_it) + u_it,

where D_it is the score or rank gap for resume i in chunking round t, I_it is the number of words in the resume, alpha_i represents Taste-Based Bias, and beta represents Statistical Bias.

Applications:

1. Regulatory Compliance: Evaluating hiring tools for compliance with NYC Local Law 144 or similar legislation.

2. Bias Audits: Identifying and mitigating gender biases in hiring systems used by HR departments and third-party vendors.

3. Research & Development: Testing and improving LLM architectures to reduce systemic biases.

Impact:

JobFair highlights inherent biases in hiring processes driven by LLMs, offering organizations a scientific basis to ensure fairness and equity. By distinguishing between Taste-Based and Statistical Biases, it enables targeted interventions, improving trust in AI-driven hiring systems. Its insights inform policymakers and developers, fostering AI systems that align with ethical hiring practices.

References

Ze Wang, Zekun Wu, Xin Guan, Michael Thaler, Adriano Koshiyama, Skylar Lu, Sachin Beepath, Ediz Ertekin, and Maria Perez-Ortiz. 2024. JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3227–3246, Miami, Florida, USA. Association for Computational Linguistics.

About the metric

You can click on the links to see the associated metrics

Metric type(s):

Technical

Objective(s):

Fairness
Transparency

Purpose(s):

Interaction support/chatbots
Reasoning with knowledge structures/planning
Content generation

Target sector(s):

Innovation
Digital Economy
Corporate governance

Lifecycle stage(s):

Verify & validate

Usage rights:

Open source/Permissive

Target users:

Data scientist
Developer
Researcher

Risk management stage(s):

Assess risks & impacts

Modify this metric

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.