AI Companies Face Lawsuits Over Use of Pirated and Copyrighted Content in Training Data

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Major AI firms, including OpenAI, are accused of training language models on datasets containing pirated and unauthorized copyrighted works, such as books by Haruki Murakami and Stephen King, and newspaper articles. This has led to lawsuits from authors and publishers, alleging intellectual property violations and demanding compensation or data removal.[AI generated]

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions that AI models have been trained on pirated copies of authors' works without permission, which constitutes a violation of intellectual property rights. This is a breach of obligations under applicable law protecting intellectual property rights, and the use of such AI systems has led to lawsuits and community pushback. Therefore, this situation qualifies as an AI Incident due to the violation of rights caused by the AI system's development and use.[AI generated]

AI principles

AccountabilityPrivacy & data governanceTransparency & explainabilityFairness

Industries

Media, social platforms, and marketingArts, entertainment, and recreation

Affected stakeholders

Business

Harm types

Economic/PropertyReputational

Severity

AI incident

Business function:

Research and development

AI system task:

Content generation

Articles about this incident or hazard

Thumbnail Image

Stephen King shares his thoughts on AI writing fiction

2023-08-24

Mashable

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions that AI models have been trained on pirated copies of authors' works without permission, which constitutes a violation of intellectual property rights. This is a breach of obligations under applicable law protecting intellectual property rights, and the use of such AI systems has led to lawsuits and community pushback. Therefore, this situation qualifies as an AI Incident due to the violation of rights caused by the AI system's development and use.

Thumbnail Image

Stephen King Shares His Thoughts On AI Writing Fiction

2023-08-25

Mashable India

Why's our monitor labelling this an incident or hazard?

The article centers on Stephen King's opinion and the broader issue of AI training on copyrighted material, which relates to intellectual property rights. While the use of pirated works for AI training could constitute a violation of intellectual property rights (an AI Incident), the article does not report a concrete incident of harm or legal ruling but rather discusses ongoing lawsuits and community responses. Therefore, it is best classified as Complementary Information, as it provides context and updates on societal and legal responses to AI-related issues without describing a new AI Incident or AI Hazard.

Thumbnail Image

Stephen King won't forbid AI from training on his writing, and he's not afraid of AI ... yet

2023-08-26

Business Insider Nederland

Why's our monitor labelling this an incident or hazard?

The article does not describe any direct or indirect harm caused by AI systems, nor does it report any incident or plausible future harm resulting from AI use. It focuses on the discussion around AI training on copyrighted works and the authors' responses, which is complementary information about societal and governance responses to AI developments. Therefore, it fits the category of Complementary Information rather than an AI Incident or AI Hazard.

Thumbnail Image

Stephen King Isn't Afraid of AI -- His Books Have Trained It - Decrypt

2023-08-25

Decrypt

Why's our monitor labelling this an incident or hazard?

The article does not report any realized harm or incident caused by AI systems, nor does it describe a specific credible risk or hazard event. Instead, it provides complementary information about societal and industry responses to AI, including strikes and lawsuits, as well as Stephen King's views. Therefore, it fits the definition of Complementary Information, as it enhances understanding of the AI ecosystem and responses without reporting a new AI Incident or AI Hazard.

Thumbnail Image

OpenAI 爬虫GPTBot被知名出版物屏蔽

2023-08-22

中关村在线

Why's our monitor labelling this an incident or hazard?

The event involves the use of an AI system (OpenAI's GPTBot crawler) in the context of data collection for AI training. However, the article focuses on the blocking of the crawler and potential legal disputes over intellectual property rights, rather than describing any realized harm caused by the AI system's development or use. There is no indication that the AI system's use has directly or indirectly led to injury, disruption, rights violations, or other harms. Instead, the article highlights a dispute and preventive measures taken by the publication. Therefore, this is best classified as Complementary Information, as it provides context on governance and legal responses related to AI data usage and intellectual property concerns, without reporting an AI Incident or AI Hazard.

Thumbnail Image

OpenAI禁止用某个出版物的内容训练模型

2023-08-22

中关村在线

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (OpenAI's models) and their development process (training data sourcing). The publication's blocking of the crawler and legal considerations relate to intellectual property rights violations, which are a form of harm under the framework. However, the article describes a dispute and potential legal action rather than a realized harm incident. There is no direct or indirect harm reported as having occurred yet, only the potential for legal conflict and rights violation claims. Therefore, this is best classified as Complementary Information, as it provides context and updates on AI development practices and legal responses, but does not describe an AI Incident or AI Hazard itself.

Thumbnail Image

《纽约时报》屏蔽了 OpenAI 的网络爬虫GPTBot

2023-08-24

chinaz.com

Why's our monitor labelling this an incident or hazard?

The event involves the use and development of AI systems (OpenAI's models trained on web content). However, the article does not describe any realized harm caused by the AI system's development or use, nor does it report any injury, rights violation, or other direct or indirect harm resulting from OpenAI's AI systems. Instead, it focuses on preventive measures and potential legal disputes regarding intellectual property rights and data usage. Therefore, this is not an AI Incident or AI Hazard but rather complementary information about governance, legal, and societal responses to AI development and data use practices.

Thumbnail Image

报纸起诉OpenAI：训练AI模型违反服务条款

2023-08-21

凤凰网（凤凰新媒体）

Why's our monitor labelling this an incident or hazard?

The event explicitly involves an AI system (OpenAI's ChatGPT) whose development (training) used copyrighted newspaper content without permission, violating the newspaper's service terms and intellectual property rights. This constitutes a breach of obligations under applicable law protecting intellectual property rights, which is a defined harm under AI Incidents. The dispute is about realized harm (unauthorized use of content), not just potential harm, and involves direct AI system development practices. Hence, it meets the criteria for an AI Incident rather than a hazard or complementary information.

Thumbnail Image

实锤！村上春树、史蒂芬·金盗版书成训练数据 AI巨头无一幸免

2023-08-22

凤凰网（凤凰新媒体）

Why's our monitor labelling this an incident or hazard?

The article explicitly details how AI companies have used unauthorized copyrighted works, including pirated books, to train large language models, leading to lawsuits and claims of copyright infringement. This constitutes a violation of intellectual property rights (harm category c) directly linked to the development and use of AI systems. The harm is realized as legal actions and potential penalties are underway, making this an AI Incident rather than a hazard or complementary information. The involvement of AI systems in the infringement and the resulting legal and rights harms meet the criteria for an AI Incident.

Thumbnail Image

《纽约时报》屏蔽 OpenAI 的网络爬虫，禁止将其内容用于 AI 训练

2023-08-22

163.com

Why's our monitor labelling this an incident or hazard?

The event involves the use of AI systems (OpenAI's AI models) trained on content scraped from The New York Times without permission, which is a breach of intellectual property rights. The blocking of the crawler and potential legal action indicate that harm related to intellectual property rights has occurred or is ongoing. Therefore, this qualifies as an AI Incident due to violation of intellectual property rights caused by the AI system's development and use.

Thumbnail Image

实锤！村上春树、史蒂芬·金盗版书成训练数据，AI 巨头无一幸免

2023-08-22

163.com

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (large language models) trained on unauthorized copyrighted materials, which constitutes a breach of intellectual property rights, a form of harm under the AI Incident definition (c). The article describes realized harm through ongoing lawsuits and legal claims, indicating direct consequences from the AI systems' development and use. The involvement of AI in the infringement is explicit, and the harm is materialized, not merely potential. Therefore, this is classified as an AI Incident.

Thumbnail Image

为了确保您每日收到“阿尔法工场”优质的财经内容推送，请进入公众号主页，点击右上角“・・・”标志，点击第一行“设为星标”。

2023-08-24

证券之星

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (large language models) trained on datasets that allegedly include unauthorized copyrighted works, leading to lawsuits and legal challenges. This constitutes a violation of intellectual property rights (a breach of applicable law protecting such rights), which is a form of harm as defined under AI Incident category (c). The lawsuits and public outcry indicate that harm has already occurred or is ongoing due to the unauthorized use of copyrighted materials in AI training. Therefore, this situation qualifies as an AI Incident rather than a hazard or complementary information, as the harm is realized and the AI system's development and use are directly implicated in the infringement.

Thumbnail Image

村上春树、史蒂芬·金盗版书成训练数据 AI巨头无一幸免 - AI 人工智能 - cnBeta.COM

2023-08-22

cnBeta.COM

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (large language models) trained on datasets containing unauthorized and pirated copyrighted books, which constitutes a violation of intellectual property rights (a breach of obligations under applicable law). The use of these datasets in AI training has directly led to legal actions and recognized harm to authors and content creators. Therefore, this qualifies as an AI Incident because the development and use of AI systems have directly led to violations of intellectual property rights and legal harms. The article also discusses ongoing lawsuits and advocacy efforts, but the primary focus is on the realized harm from unauthorized AI training data use, not just potential or complementary information.

Thumbnail Image

《纽约时报》在访问规则中封杀源自OpenAI的网络爬虫 - Media 全球媒体 - cnBeta.COM

2023-08-22

cnBeta.COM

Why's our monitor labelling this an incident or hazard?

The event involves the use of an AI system (OpenAI's AI models trained on web-scraped data) and concerns about the development and use of AI systems with respect to intellectual property rights. The blocking of the crawler and the potential lawsuits relate to violations of intellectual property rights, which is a form of harm under the AI Incident definition. Since the event describes actual actions taken (blocking the crawler) and ongoing legal disputes over past data use, it indicates realized harm or at least ongoing infringement issues, not just potential future harm. Therefore, this qualifies as an AI Incident due to violations of intellectual property rights caused by the AI system's development and use.