Receiver Operating Characteristic Curve (ROC) and Area Under the Curve (AUC)

Catalogue of Tools & Metrics for Trustworthy AI

These tools and metrics are designed to help AI actors develop and use trustworthy AI systems and applications that respect human rights and are fair, transparent, explainable, robust, secure and safe.

Overview Tools Metrics About the catalogue

Website

This metric computes the area under the curve (AUC) for the Receiver Operating Characteristic Curve (ROC). The return values represent how well the model used is predicting the correct classes, based on the input data. A score of 0.5 means that the model is predicting exactly at chance, i.e. the model’s predictions are correct at the same rate as if the predictions were being decided by the flip of a fair coin or the roll of a fair die. A score above 0.5 indicates that the model is doing better than chance, while a score below 0.5 indicates that the model is doing worse than chance.

This metric has three separate use cases:

binary: The case in which there are only two different label classes, and each example gets only one label. This is the default implementation.
multiclass: The case in which there can be more than two different label classes, but each example still gets only one label.
multilabel: The case in which there can be more than two different label classes, and each example can have more than one label.

Trustworthy AI Relevance

This metric addresses Robustness and Transparency by quantifying relevant system properties. Primary — Robustness: AUC/ROC measures how reliably a model separates positive from negative cases across decision thresholds, making it a strong indicator of a model's discriminative performance and its consistency when classification thresholds vary or when comparing models under different conditions (e.g., development vs. validation).

Related use cases :

The meaning and use of the area under a receiver operating characteristic (ROC) curve

Uploaded on Oct 21, 2022

A representation and interpretation of the area under a receiver operating characteristic (ROC) curve obtained by the “rating” method, or by mathematical prediction...

A Label Attention Model for ICD Coding from Clinical Text

Uploaded on Nov 1, 2023

Multi-modal fusion approaches aim to integrate information from different data sources. Unlike natural datasets, such as in audio-visual applications, where samples consist of "pai...

Attention-based residual autoencoder for video anomaly detection

Uploaded on Nov 1, 2023

Self-driving cars need to understand 3D scenes efficiently and accurately in order to drive safely. Given the limited hardware resources, existing 3D perception models are not able...

DROCC: Deep Robust One-Class Classification

Uploaded on Nov 1, 2023

Classical approaches for one-class problems such as one-class SVM and isolation forest require careful feature engineering when applied to structured domains like images. State-of-...

Dense Steerable Filter CNNs for Exploiting Rotational Symmetry in Histology Images

Uploaded on Nov 1, 2023

Histology images are inherently symmetric under rotation, where each orientation is equally as likely to appear. However, this rotational symmetry is not widely utilised as prior k...

FastAno: Fast Anomaly Detection via Spatio-temporal Patch Transformation

Uploaded on Nov 1, 2023

There are growing implications surrounding generative AI in the speech domain that enable voice cloning and real-time voice conversion from one individual to another. This technolo...

GIPA: A General Information Propagation Algorithm for Graph Learning

Uploaded on Nov 1, 2023

3D object detection in point clouds is a challenging vision task that benefits various applications for understanding the 3D visual world. Lots of recent research focuses on how to...

HSTFormer: Hierarchical Spatial-Temporal Transformers for 3D Human Pose Estimation

Uploaded on Nov 1, 2023

Zero-shot learning (ZSL) aims to predict unseen classes whose samples have never appeared during training. One of the most effective and widely used semantic information for zero-s...

LDDMM-Face: Large Deformation Diffeomorphic Metric Learning for Flexible and Consistent Face Alignment

Uploaded on Nov 1, 2023

The increasing volume of commercially available conversational agents (CAs) on the market has resulted in users being burdened with learning and adopting multiple agents to accompl...

Large-scale Robust Deep AUC Maximization: A New Surrogate Loss and Empirical Studies on Medical Image Classification

Uploaded on Nov 1, 2023

Deep AUC Maximization (DAM) is a new paradigm for learning a deep neural network by maximizing the AUC score of the model on a dataset. Most previous works of AUC maximization focu...

Neural Bellman-Ford Networks: A General Graph Neural Network Framework for Link Prediction

Uploaded on Nov 1, 2023

Driven by large-data pre-training, Segment Anything Model (SAM) has been demonstrated as a powerful and promptable framework, revolutionizing the segmentation models. Despite the g...

New Benchmarks for Learning on Non-Homophilous Graphs

Uploaded on Nov 1, 2023

In recent years, there is strong emphasis on mining medical data using machine learning techniques. A common problem is to obtain a noiseless set of textual documents, with a relev...

P-STMO: Pre-Trained Spatial Temporal Many-to-One Model for 3D Human Pose Estimation

Uploaded on Nov 1, 2023

The referring video object segmentation task (RVOS) involves segmentation of a text-referred object instance in the frames of a given video. Due to the complex nature of this multi...

Roto-Translation Equivariant Convolutional Networks: Application to Histopathology Image Analysis

Uploaded on Nov 1, 2023

Rotation-invariance is a desired property of machine-learning models for medical image analysis and in particular for computational pathology applications. We propose a framework t...

Set Features for Fine-grained Anomaly Detection

Uploaded on Nov 1, 2023

Audio-based automatic speech recognition (ASR) degrades significantly in noisy environments and is particularly vulnerable to interfering speech, as the model cannot determine whic...

MambaTab: A Simple Yet Effective Approach for Handling Tabular Data

Uploaded on Mar 15, 2024

Multimodal Large Language Models (MLLMs) have experienced significant advancements recently. Nevertheless, challenges persist in the accurate recognition and comprehension of intri...

About the metric

You can click on the links to see the associated metrics

Objective(s):

Robustness
Transparency

Purpose(s):

Event/anomaly detection
Forecasting/prediction
Recognition/object detection

Target sector(s):

Agriculture
Science & technology
Public governance
Innovation
Health
Environment
Education
Digital Economy
Corporate governance
Transport

Lifecycle stage(s):

Operate & monitor
Verify & validate
Build & interpret model

Target users:

Data scientist
Developer
Project manager
System operators

Risk management stage(s):

Define
Assess
Govern
Treat

Modify this metric

Partnership on AI

Disclaimer: The tools and metrics featured herein are solely those of the originating authors and are not vetted or endorsed by the OECD or its member countries. The Organisation cannot be held responsible for possible issues resulting from the posting of links to third parties' tools and metrics on this catalogue. More on the methodology can be found at https://oecd.ai/catalogue/faq.