These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.
If a model systematically makes errors disproportionately for patients in the protected group, it is likely to lead to unequal outcomes. Equal performance refers to the assurance that a model is equally accurate for patients in the protected and non-protected groups. Equal performance has 3 commonly discussed types: equal sensitivity (also known as equal opportunity [36]), equal sensitivity and specificity (also known as equalized odds), and equal positive predictive value (commonly referred to as predictive parity [37]). Not only can these metrics be calculated, but techniques exist to force models to have one of these properties (36, 38–41).
When should each type of equal performance be considered? A higher false-negative rate in the protected group in case 1 would mean African American patients were missing the opportunity to be identified; in this case, equal sensitivity is desirable. A higher false-positive rate might be especially deleterious by leading to potentially harmful interventions (such as unnecessary biopsies), motivating equal specificity. When the positive predictive value for alerts in the protected group is lower than in the non-protected groups, clinicians may learn that the alerts are less informative for them and act on them less (a situation known as class-specific alert fatigue). Ensuring equal positive predictive value is desirable in this case.
Equal performance, however, may not necessarily translate to equal outcomes. First, the recommended treatment informed by the prediction may be less effective for patients in the protected group (for example, because of different responses to medications and a lack of research on heterogeneous treatment effects [42]). Second, even if a model is inaccurate for a group, clinicians might compensate with additional vigilance, overcoming the model’s deficiencies.
Third, forcing a model’s predictions to have one of the equal performance characteristics may have unexpected consequences. In case 1, ensuring that a model will detect African American and non–African American patients at equal rates (equal sensitivity) could be straightforwardly accomplished by lowering the threshold for the protected class to receive the intervention. This simultaneously increases the false-positive rate for this group, manifesting as more false alarms and subsequent class-specific alert fatigue. Likewise, equalized odds can be achieved by lowering accuracy for the non-protected group, which undermines the principle of beneficence.
Related use cases :
Inclusion of Demographic-Specific Information in Studies Supporting US Food & Drug Administration Approval of High-Risk Medical Devices
Uploaded on Nov 3, 2022This analysis characterizes the studies used to support US Food & Drug Administration 2015 premarket approval of devices, particularly findings of device safety and effecti...
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Uploaded on Apr 2, 2024The VampPrior Mixture Model
Uploaded on Apr 2, 2024You Only Need One Color Space: An Efficient Network for Low-light Image Enhancement
Uploaded on Apr 2, 2024Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Uploaded on Apr 22, 2024Efficient Image Super-Resolution via Symmetric Visual Attention Network
Uploaded on Apr 22, 2024MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Uploaded on Apr 22, 2024ProMISe: Promptable Medical Image Segmentation using SAM
Uploaded on Apr 22, 2024ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Uploaded on Apr 22, 2024Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Uploaded on May 21, 2024Efficient Image Super-Resolution via Symmetric Visual Attention Network
Uploaded on May 21, 2024MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Uploaded on May 21, 2024ProMISe: Promptable Medical Image Segmentation using SAM
Uploaded on May 21, 2024ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Uploaded on May 21, 2024Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Uploaded on Jun 5, 2024Efficient Image Super-Resolution via Symmetric Visual Attention Network
Uploaded on Jun 5, 2024MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Uploaded on Jun 5, 2024ProMISe: Promptable Medical Image Segmentation using SAM
Uploaded on Jun 5, 2024ShapeLLM: Universal 3D Object Understanding for Embodied Interaction
Uploaded on Jun 5, 2024Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations
Uploaded on Jan 9, 2025Efficient Image Super-Resolution via Symmetric Visual Attention Network
Uploaded on Jan 14, 2025MTP: Advancing Remote Sensing Foundation Model via Multi-Task Pretraining
Uploaded on Jan 16, 2025ProMISe: Promptable Medical Image Segmentation using SAM
Uploaded on Jan 21, 2025About the metric
You can click on the links to see the associated metrics
Objective(s):
Purpose(s):
Target sector(s):
Lifecycle stage(s):
Target users:
Risk management stage(s):