Equal performance

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Website

If a model systematically makes errors disproportionately for patients in the protected group, it is likely to lead to unequal outcomes. Equal performance refers to the assurance that a model is equally accurate for patients in the protected and non-protected groups. Equal performance has 3 commonly discussed types: equal sensitivity (also known as equal opportunity [36]), equal sensitivity and specificity (also known as equalized odds), and equal positive predictive value (commonly referred to as predictive parity [37]). Not only can these metrics be calculated, but techniques exist to force models to have one of these properties (36, 38–41).

When should each type of equal performance be considered? A higher false-negative rate in the protected group in case 1 would mean African American patients were missing the opportunity to be identified; in this case, equal sensitivity is desirable. A higher false-positive rate might be especially deleterious by leading to potentially harmful interventions (such as unnecessary biopsies), motivating equal specificity. When the positive predictive value for alerts in the protected group is lower than in the non-protected groups, clinicians may learn that the alerts are less informative for them and act on them less (a situation known as class-specific alert fatigue). Ensuring equal positive predictive value is desirable in this case.

Equal performance, however, may not necessarily translate to equal outcomes. First, the recommended treatment informed by the prediction may be less effective for patients in the protected group (for example, because of different responses to medications and a lack of research on heterogeneous treatment effects [42]). Second, even if a model is inaccurate for a group, clinicians might compensate with additional vigilance, overcoming the model’s deficiencies.

Third, forcing a model’s predictions to have one of the equal performance characteristics may have unexpected consequences. In case 1, ensuring that a model will detect African American and non–African American patients at equal rates (equal sensitivity) could be straightforwardly accomplished by lowering the threshold for the protected class to receive the intervention. This simultaneously increases the false-positive rate for this group, manifesting as more false alarms and subsequent class-specific alert fatigue. Likewise, equalized odds can be achieved by lowering accuracy for the non-protected group, which undermines the principle of beneficence.

Related use cases :

Inclusion of Demographic-Specific Information in Studies Supporting US Food & Drug Administration Approval of High-Risk Medical Devices

Uploaded on Nov 3, 2022

This analysis characterizes the studies used to support US Food & Drug Administration 2015 premarket approval of devices, particularly findings of device safety and effecti...

MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations

Uploaded on Apr 2, 2024

Object detectors often perform poorly on data that differs from their training set. Domain adaptive object detection (DAOD) methods have recently demonstrated strong results on add...

The VampPrior Mixture Model

Uploaded on Apr 2, 2024

Language models (LMs) have proven to be powerful tools for psycholinguistic research, but most prior work has focused on purely behavioural measures (e.g., surprisal comparisons). ...

You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement

Uploaded on Apr 2, 2024

In this work, we introduce Mini-Gemini, a simple and effective framework enhancing multi-modality Vision Language Models (VLMs). Despite the advancements in VLMs facilitating basic...

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Uploaded on Apr 22, 2024

The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is...

Efficient Image Super-Resolution via Symmetric Visual Attention Network

Uploaded on Apr 22, 2024

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP ...

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Uploaded on Apr 22, 2024

We propose a novel model-selection method for dynamic real-life networks. Our approach involves training a classifier on a large body of synthetic network data. The data is generat...

ProMISe: Promptable Medical Image Segmentation using SAM

Uploaded on Apr 22, 2024

This paper introduces fourteen novel datasets for the evaluation of Large Language Models' safety in the context of enterprise tasks. A method was devised to evaluate a model's saf...

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Uploaded on Apr 22, 2024

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership wi...

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Uploaded on May 21, 2024

The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is...

Efficient Image Super-Resolution via Symmetric Visual Attention Network

Uploaded on May 21, 2024

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP ...

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Uploaded on May 21, 2024

We propose a novel model-selection method for dynamic real-life networks. Our approach involves training a classifier on a large body of synthetic network data. The data is generat...

ProMISe: Promptable Medical Image Segmentation using SAM

Uploaded on May 21, 2024

This paper introduces fourteen novel datasets for the evaluation of Large Language Models' safety in the context of enterprise tasks. A method was devised to evaluate a model's saf...

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Uploaded on May 21, 2024

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership wi...

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Uploaded on Jun 5, 2024

The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is...

Efficient Image Super-Resolution via Symmetric Visual Attention Network

Uploaded on Jun 5, 2024

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP ...

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Uploaded on Jun 5, 2024

We propose a novel model-selection method for dynamic real-life networks. Our approach involves training a classifier on a large body of synthetic network data. The data is generat...

ProMISe: Promptable Medical Image Segmentation using SAM

Uploaded on Jun 5, 2024

This paper introduces fourteen novel datasets for the evaluation of Large Language Models' safety in the context of enterprise tasks. A method was devised to evaluate a model's saf...

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Uploaded on Jun 5, 2024

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership wi...

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Uploaded on Jan 9, 2025

The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is...

Efficient Image Super-Resolution via Symmetric Visual Attention Network

Uploaded on Jan 14, 2025

In this paper, we introduce an open-vocabulary panoptic segmentation model that effectively unifies the strengths of the Segment Anything Model (SAM) with the vision-language CLIP ...

MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining

Uploaded on Jan 16, 2025

We propose a novel model-selection method for dynamic real-life networks. Our approach involves training a classifier on a large body of synthetic network data. The data is generat...

ProMISe: Promptable Medical Image Segmentation using SAM

Uploaded on Jan 21, 2025

This paper introduces fourteen novel datasets for the evaluation of Large Language Models' safety in the context of enterprise tasks. A method was devised to evaluate a model's saf...

ShapeLLM: Universal 3D Object Understanding for Embodied Interaction

Uploaded on Jan 29, 2025

The BigCode project, an open-scientific collaboration focused on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder2. In partnership wi...

Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations

Uploaded on Mar 31, 2025

The task of stock earnings forecasting has received considerable attention due to the demand investors in real-world scenarios. However, compared with financial institutions, it is...

About the metric

You can click on the links to see the associated metrics

Objective(s):

Fairness

Purpose(s):

Recognition/object detection

Target sector(s):

Public governance
Investment
Innovation
Health
Finance and insurance
Digital Economy
Corporate governance

Lifecycle stage(s):

Build & interpret model

Target users:

Developer

Risk management stage(s):

Treat

Modify this metric

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.