These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
JobFair is a robust framework designed to benchmark and evaluate hierarchical gender hiring biases in Large Language Models (LLMs) used for resume scoring. It identifies and quantifies two primary types of bias: Level Bias (differences in average outcomes) and Spread Bias (differences in outcome variability). Level Bias is further categorized into Taste-Based Bias (consistent bias regardless of resume length) and Statistical Bias (bias varying with resume length).
Applicable Models:
JobFair is applicable to any LLMs used in hiring or resume scoring, including GPT, Claude, Llama, and others. It can also be adapted for exploring other social traits and tasks beyond hiring.
Background:
Grounded in labor economics and legal principles, JobFair expands on traditional bias metrics by integrating both statistical and computational approaches. It builds on causal inference techniques from the Rubin Causal Model and introduces Ranking After Scoring (RAS) for enhanced bias evaluation. The framework complies with NYC Local Law 144.
Formulae:
1. Impact Ratio:
Impact Ratio for males = (Selection Rate of Male Group) / (Selection Rate of Most Selected Gender Group).
Selection rate is the proportion of individuals in a demographic group moving forward in the hiring process.
2. Permutation Test for Bias:
This statistical test assesses whether rank or variance differences between male and female groups are significant by comparing observed differences with 100,000 random permutations of the data.
3. Fixed-Effects Regression for Statistical Bias:
D_it = alpha_i + beta * log(I_it) + u_it,
where D_it is the score or rank gap for resume i in chunking round t, I_it is the number of words in the resume, alpha_i represents Taste-Based Bias, and beta represents Statistical Bias.
Applications:
1. Regulatory Compliance: Evaluating hiring tools for compliance with NYC Local Law 144 or similar legislation.
2. Bias Audits: Identifying and mitigating gender biases in hiring systems used by HR departments and third-party vendors.
3. Research & Development: Testing and improving LLM architectures to reduce systemic biases.
Impact:
JobFair highlights inherent biases in hiring processes driven by LLMs, offering organizations a scientific basis to ensure fairness and equity. By distinguishing between Taste-Based and Statistical Biases, it enables targeted interventions, improving trust in AI-driven hiring systems. Its insights inform policymakers and developers, fostering AI systems that align with ethical hiring practices.
References
- Ze Wang, Zekun Wu, Xin Guan, Michael Thaler, Adriano Koshiyama, Skylar Lu, Sachin Beepath, Ediz Ertekin, and Maria Perez-Ortiz. 2024. JobFair: A Framework for Benchmarking Gender Hiring Bias in Large Language Models. In Findings of the Association for Computational Linguistics: EMNLP 2024, pages 3227–3246, Miami, Florida, USA. Association for Computational Linguistics.
About the metric
You can click on the links to see the associated metrics
Metric type(s):
Objective(s):
Purpose(s):
Target sector(s):
Lifecycle stage(s):
Usage rights:
Target users:
Risk management stage(s):
