Overview and methodology of the OECD AI Incidents Monitor

Methodology and disclosures

Overview

The AI Incidents Monitor (AIM) was initiated and is being developed by the OECD.AI expert group on AI incidents with the support of the Patrick J. McGovern Foundation. In parallel, the expert group is working on an AI incident reporting framework. The goal of the AIM is to track actual AI incidents and hazards in real time and provide the evidence-base to inform the AI incident reporting framework and related AI policy discussions.

The AIM is being informed by the work of the expert group on defining AI incidents and associated terminology, such as AI hazards and disasters. In parallel, the AIM seeks to provide a ‘reality check’ to make sure the definition of an AI incident and reporting framework function with real-world AI incidents and hazards.

As a starting point, AI incidents and hazards reported in reputable international media globally are identified and classified using machine learning models. Similar models are used to classify incidents and hazards into different categories from the OECD Framework for the Classification of AI systems, including their severity, industry, related AI Principle, types of harms and affected stakeholders.

The analysis is done based on the title, abstract and first few paragraphs of each news article. News articles come from Event Registry, a news intelligence platform that monitors world news and can detect specific event types reported in news articles, with over 150 000 articles English articles processed every day.

While recognising the likelihood that these incidents and hazards only represent a subset of all AI incidents and hazards worldwide, these publicly reported incidents and hazards nonetheless provide a useful starting point for building the evidence base.

Incidents and hazards can be composed of one or more news articles covering the same event. To mitigate concerns related to editorial bias and disinformation, each report’s annotations and metadata are extracted from the most reputable news outlet reporting on such incident and hazard based on the Alexa traffic rank. Additionally, incidents and hazards are sorted by the number of articles reporting on them and their relevance to the specific query, as determined by their semantic similarity. Lastly, links to all the articles reporting on a specific incident or hazard are provided for completeness.

The data collection and analysis for the AIM is done to ensure, to the best extent possible, the reliability, objectivity and quality of the information for AI incidents and hazards. A detailed methodological note is available here.

In the future, an open submission process may be enabled to complement the AI incidents and hazards information from news articles. To ensure consistency in reporting, the existing classification algorithm could be leveraged to process text submissions and provide a pre-selection of tags for a given incident or hazard report. Additionally, it is expected that incident and hazard information from news articles be complemented by court judgements and decisions of public supervisory authorities wherever they exist.

Definitions

Thanks to the work of the OECD.AI expert group on AI incidents an AI incident and related terminology were defined. Published in May 2024, the paper Defining AI incidents and related terms, defines an event where the development or use of an AI system results in actual harm as an “AI incident”, while an event where the development or use of an AI system is potentially harmful is termed an “AI hazard”.

  • An AI incident is an event, circumstance or series of events where the development, use or malfunction of one or more AI systems directly or indirectly leads to any of the following harms:
    (a) injury or harm to the health of a person or groups of people;
    (b) disruption of the management and operation of critical infrastructure;
    (c) violations of human rights or a breach of obligations under the applicable law intended to protect fundamental, labour and intellectual property rights;
    (d) harm to property, communities or the environment.
  • An AI hazard is an event, circumstance or series of events where the development, use or malfunction of one or more AI systems could plausibly lead to an AI incident, i.e., any of the following harms:
    (a) injury or harm to the health of a person or groups of people;
    (b) disruption of the management and operation of critical infrastructure;
    (c) violations to human rights or a breach of obligations under the applicable law intended to protect fundamental, labour and intellectual property rights;
    (d) harm to property, communities or the environment.

Information Transparency Disclosures

  • Background: Your use of the OECD AI Incidents Monitor (“AIM”) is subject to the terms and conditions found at www.oecd.org/termsandconditions. The following disclosures do not modify or supersede those terms. Instead, these disclosures aim to provide greater transparency surrounding information included in the AIM.
  • Third-Party Information: The AIM serves as an accessible starting point for comprehending the landscape of AI-related challenges. As a result, please be aware that the AIM is populated with news articles from various third-party outlets and news aggregators with which the OECD has no affiliation.
  • Views Expressed: Please know that any views or opinions expressed on the AIM are solely those of the third-party outlets that created them and do not represent the views or opinions of the OECD. Further, the inclusion of any news article or incident does not constitute an endorsement or recommendation by the OECD.
  • Errors and Omissions: The OECD cannot guarantee and does not independently verify the accuracy, completeness, or validity of third-party information provided in the AIM. You should be aware that information included in the AIM may contain various errors and omissions.
  • Intellectual Property: Any of the copyrights, trademarks, service marks, collective marks, design rights, or other intellectual property or proprietary rights that are mentioned, cited, or otherwise included in the AIM are the property of their respective owners. Their use or inclusion in the AIM does not imply that you may use them for any other purpose. The OECD is not endorsed by, does not endorse, and is not affiliated with any of the holders of such rights, and as such, the OECD cannot and do not grant any rights to use or otherwise exploit these protected materials included herein.

Methodology for monitoring AI incidents

Introduction

This section describes how Event Registry detects and categorises events considered to be AI incidents.

An incident or hazard refers to any event that might or might not lead to harm or damage. When such event results in harm or damage it is called an accident. An Artificial Intelligence (AI) incident or hazard refers to an unexpected or unintended event involving AI systems that results in harm, potential harm, or deviations from expected performance, potentially compromising the safety, fairness, or trustworthiness of the AI system in question.

For the purpose of monitoring AI incidents, Event Registry defines AI as the capability of machines to perform functions typically thought of, or at least thought of in the past, as requiring human intelligence. A system is a structured combination of parts, tools, or techniques that work together to achieve a specific goal or function. An AI system is any system that involves AI, even if it is composed of many other parts. Under these definitions, many systems that are not necessarily thought of as AI in a purely scientific manner, are included. For example, very simple credit scoring system using purely statistical methods can be considered as an AI system as they accomplish the task of determining the credit worthiness of an individual, which has traditionally been accomplished by other humans. Any time that part of a decision-making process is transferred to an algorithm, the system is considered to be an AI system. Other decision tasks where AI systems are involved include product recommendation systems, content moderation, and fraud detection. Similarly to decision tasks, perception tasks were also historically undertaken by humans. These can include tasks such as reading road signs in the context of a driver assistance feature in a car, recognising faces of known people from a surveillance camera, or recognising handwritten addresses in letter or packages for a postal service. Accidents and even incidents involving systems of all kinds are often reported (or forecasted) in the context of news media. Event Registry monitors world news and can detect specific event types reported in news articles, with over 150.000 articles English articles processed every day.

AI incident detection

In the context of a knowledge base, such as Wikidata [1], events or occurrences can be modelled structurally as statements composed of subjects (items), predicates (properties) and objects (values). To model an event in this structure, consider the following example:

SubjectPredicateObjectWikipedia link
MH370Instance ofFlight disappearanceMalaysia Airlines Flight 370
Table 1: Example of structured event information from Wikidata.

Or, as a triplet: (MH370, instance of, flight disappearance) = (Q15908324, P31, Q104776655)

But news articles do not include data structured in this way. For the previously mentioned incident of the disappearance of flight MH370, one article stated:

Missing Malaysia Airlines plane: Flight MH370 carrying almost 240 passengers 'disappears' en route from Kuala Lumpur to Beijing.

The core information of the event is present: It’s a flight, identified by the code MH370, that disappeared. The subject of this story is the “Disappearance of flight MH370”.  Although the core information is present in the text, the “subject” hasn’t been explicitly defined in the way knowledge bases expect.

An alternative way of representing this event can be formulated in terms of the entities involved: (Malaysia Airlines, Incident, Flight MH370)

With the plain English meaning that Flight MH370 belonging to Malaysia Airlines was involved in an Incident. To bring back the discussion to AI incidents, consider the following sentence:

A Tesla Model S with Autopilot on failed to stop at a T intersection and crashed into a Chevrolet Tahoe parked on a shoulder, killing Naibel Leon, 22.

This can be modelled as any of the following:

  • (Tesla, AI Incident, Autopilot): An AI incident involving the company Tesla and its product, the Model S.
  • (Tesla, AI Incident, Model S): An AI incident involving the company Tesla and its AI System, Autopilot.
  • (Autopilot, AI Incident, Naibel Leon): An AI incident involving the AI System Autopilot which harmed Naibel Leon.

This formulation enables Event Registry to frame the problem of detecting an AI Incident as a supervised machine learning task, specifically, a text classification problem: given a sentence, does it express an AI incident involving a given pair of entities?

In order to classify pairs of entities in a sentence, the entities present in the text need to first be identified. To identify entities in the text, both Named Entity Recognition (NER) using spacy [2] and a separate Entity Detection and Linking system, Wikifier [3] are used. A supervised dataset is then created, which will be used by a learning algorithm to train a machine learning model to perform AI Incident event detection.

Model

The model used to classify pairs of entities in the context of a sentence is based on a Transformer [4] Neural Network. It uses a BERT-like [5] pretrained language model, RoBERTa [6] to encode the text of a sentence. Before encoding, the sentence text is modified to surround the pair of entities being classified with special tokens and the entity type (e.g. organisation, location, product, …) is added before the beginning of the entity mention. This follows the procedure described in [7] for relation classification. The model architecture itself follows [8]: the transformer encodes the input text, the transformer output embeddings corresponding to the special tokens before each entity are then concatenated together into a single vector which is then passed to a classification head consisting of a linear layer followed by the Softmax activation function to produce a normalised probability distribution over the possible event classes. Notably, what is the probability that the pair of entities in that sentence corresponds to an AI Incident.

Figure 1: Architecture of the AI Incident Detection Model.

Dataset

The problem of detecting an AI Incident is formulated as a supervised machine learning task. The learning process is commonly known training the model. That implies there must be learning examples which are used to learn the parameters of the model. These examples correspond to the model inputs and the associated model targets. As a set of examples, they are commonly called a training set. In order to estimate the performance of the model on unseen data it is necessary to also have a set of examples that are not seen by the model during training. In this case, we call that set of examples the validation set. Examples for training this model includes both positive and negative examples. Positive examples are examples of AI Incidents, while negative examples are not AI incidents. The negative examples were chosen to be semantically similar to positive examples and thus can be considered as “hard examples”.

Data splitPositive examples(Hard) Negative examples
Training766996
Validation24594
Table 2: Number of examples in the AI Incident detection dataset.

Performance metrics

Performance metrics for classification follow definitions closely linked to statistical hypotheses testing. They are defined in terms of the truth value of the model’s prediction and the value of the target example. A correctly classified examples is called a True Positive (TP) if the corresponding target value is positive, and True Negatives (TN) if the associated target is negative. An incorrectly classified example is called a False Positive (FP) if the example’s target is negative, and a False Negative (FN) if the target is positive. Given a supervised set of examples, it is common to define its performance in terms of relative rates called precision and recall:

Since these two metrics ultimately represent two different types of errors, they are complimentary. When either necessary or desirable to use a single metric that encompasses both types of errors, it is common to use an F-score, typically the F1-score defined as the harmonic mean of precision and recall:

In cases when there are multiple classes beyond just positive and negative, called a multiclass problem, it is common to average class specific F1 scores. There are two common types of averages commonly used: the micro-average which is biased by the class frequency and the macro-average which considers all classes as equally important regardless of their respective frequencies. Averages reported in this document are micro-averages. Metrics are written as percentages and round to the nearest whole number. Experimental results:

TaskPrecisionRecallF1
AI incident899090
Table 3: Evaluation metrics for AI Incident detection.

Categorisation of AI incidents

AI Incidents are categorised according to several characteristics of the event. Currently, these characteristics are all related to the harm caused. Namely, the level of severity of the incident, the type of harm caused, and the general group to which the harmed parties. The following taxonomy is used for each of these characteristics:

Harm Level:

  • the incident resulted in death;
  • serious but non-fatal physical injury occurred;
  • psychological, financial, reputational, very minor injury, other;
  • potential, hypothetical, future harm, threat or unrealised danger;
  • no harm, not specified, unknown.

Harm Type:

  • physical harm to a person or persons, injury or death;
  • harm to the environment;
  • psychological harm to a person or group including manipulation, deception, or coercion;
  •  financial to a person or organisation;
  • reputational harm to a person or organisation;
  • harm to the public interest;
  •  violation of human or fundamental rights;
  • no harm, not specified, or unknown.

Harmed Groups:

  • consumers, people who own or use the product or service;
  • workers, employees or job applicants;
  • a business or businesses or stockholders in the business;
  • researchers or scientists;
  • a government, government agencies or officials;
  • ethnic, religious or gender minorities, lower or working class people;
  • women as a gender group;
  • children or minors;
  • people with disabilities;
  • migrants, foreigners, migrants, refugees, or asylum seekers as a group;
  • public, people not otherwise included in this list (e.g. citizens, pedestrians, passers-by, etc.);
  • unknown, not specified.

Level and Type are defined as multiclass classification problems, meaning only one of the values can be true for each of them. Multiple groups can be harmed in the same incident and thus it was defined as a multilabel classification problem. Note that the label “unknown” in this case is only used if none of the other groups were identified as being harmed.

Model

The model for categorising AI incidents is a multitask model with a shared encoder. The encoder is distilroberta-base [9] and each task has a dedicated classification head which uses the output of the transformer corresponding to the special input token “[CLS]“ as in [5]. The input consists of the sentence where the AI Incident was detected, plus the next two sentences.

Figure 2: Architecture of the AI Incident categorisation model. Left: multitask diagram; Right: BERT architecture, image taken from [5].

Dataset

The dataset for categorisation of AI Incidents corresponds to the positive examples in the dataset for detecting AI Incidents. Experimental results:

TaskPrecisionRecallF1
Harm level767676
Harm type575757
Harmed groups696667
Table 4: Evaluation metrics for AI Incident categorisation.

References

[1] Vrandečić, Denny, and Markus Krötzsch. “Wikidata: a free collaborative knowledgebase.” Communications of the ACM 57.10 (2014): 78-85. https://www.wikidata.org

[2] https://spacy.io

[3] Brank, Janez, Gregor Leban, and Marko Grobelnik. “Annotating documents with relevant wikipedia concepts.” Proceedings of SiKDD 472 (2017). https://wikifier.org

[4] Vaswani, Ashish, et al. “Attention is all you need.” Advances in neural information processing systems 30 (2017).

[5] Kenton, Jacob Devlin Ming-Wei Chang, and Lee Kristina Toutanova. “BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding.” Proceedings of NAACL-HLT. 2019.

[6] Liu, Yinhan, et al. “Roberta: A robustly optimized bert pretraining approach.” arXiv preprint arXiv:1907.11692 (2019).

[7] Zhou, Wenxuan, and Muhao Chen. “An Improved Baseline for Sentence-level Relation Extraction.” Proceedings of the 2nd Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 12th International Joint Conference on Natural Language Processing. 2022.

[8] Soares, Livio Baldini, et al. “Matching the Blanks: Distributional Similarity for Relation Learning.” Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. 2019.

[9] Sanh, Victor, et al. “DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter.” arXiv preprint arXiv:1910.01108 (2019). https://huggingface.co/distilroberta-base