Anthropic Secretly Scanned Millions of Books for AI Training, Leading to Copyright Settlement

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

AI start-up Anthropic secretly acquired and destructively scanned millions of books to train its AI models, including the chatbot Claude. This unauthorized use of copyrighted works led to a major lawsuit by authors, resulting in Anthropic agreeing to a $1.5 billion settlement for copyright infringement.[AI generated]

Why's our monitor labelling this an incident or hazard?

The article details how AI companies, including Anthropic, used AI systems trained on large datasets of books, some obtained through unauthorized or pirated sources, leading to copyright infringement claims. The involvement of AI in processing these materials to train language models is explicit. The harm is realized in the form of violations of intellectual property rights and legal actions taken by authors and publishers. Although some courts have ruled on fair use, the companies faced legal consequences for how they acquired the data. This direct link between AI system development/use and harm to rights holders qualifies the event as an AI Incident under the framework.[AI generated]
AI principles
AccountabilityTransparency & explainability

Industries
Media, social platforms, and marketing

Affected stakeholders
WorkersBusiness

Harm types
Economic/Property

Severity
AI incident

Business function:
Research and development

AI system task:
Content generation


Articles about this incident or hazard

Thumbnail Image

Inside one company's secret plan to 'destructively scan every book in the world'

2026-01-27
Washington Post
Why's our monitor labelling this an incident or hazard?
The article details how AI companies, including Anthropic, used AI systems trained on large datasets of books, some obtained through unauthorized or pirated sources, leading to copyright infringement claims. The involvement of AI in processing these materials to train language models is explicit. The harm is realized in the form of violations of intellectual property rights and legal actions taken by authors and publishers. Although some courts have ruled on fair use, the companies faced legal consequences for how they acquired the data. This direct link between AI system development/use and harm to rights holders qualifies the event as an AI Incident under the framework.
Thumbnail Image

Anthropic Scans Purchased Books for AI Training

2026-01-28
Chosun.com
Why's our monitor labelling this an incident or hazard?
The event explicitly involves AI system development through training on large book datasets, including purchased and scanned books. The legal dispute and settlement relate to intellectual property rights, which is a recognized harm category. However, the court ruled the use of purchased books as fair use, and the settlement resolved the dispute, indicating no ongoing or realized violation causing harm. There is no indication of injury, disruption, or other harms directly or indirectly caused by the AI system's use. The main focus is on the legal and governance aspects surrounding AI training data, making this a case of Complementary Information rather than an AI Incident or Hazard.
Thumbnail Image

How Silicon Valley built AI: buying, scanning and discarding millions of books

2026-01-28
Irish Independent
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (large language models and chatbots) trained on massive datasets of books, including unauthorized and pirated copies. The development and use of these AI systems directly led to copyright infringement harms, which are violations of intellectual property rights. The article details realized harm through lawsuits, settlements, and legal findings, confirming that the AI systems' development and use caused these harms. Hence, this qualifies as an AI Incident rather than a hazard or complementary information.
Thumbnail Image

Inside a tech company's secretive plan to destroy millions of books

2026-01-27
The Boston Globe
Why's our monitor labelling this an incident or hazard?
The event explicitly involves AI systems (large language models like Anthropic's Claude) trained on extensive book datasets. The development and use of these AI systems directly led to copyright infringement harms, as authors' works were used without permission, violating intellectual property rights. The legal filings and settlement confirm that harm occurred, even if the AI models' outputs are transformative. The involvement of AI in the development and use of these models is central, and the harm is materialized through copyright violations and legal consequences. Hence, this is classified as an AI Incident.
Thumbnail Image

How Silicon Valley built AI: Buying, scanning and discarding millions of books

2026-01-28
Anchorage Daily News
Why's our monitor labelling this an incident or hazard?
The article explicitly involves AI systems (large language models trained on book data) and describes how their development and use have led to copyright infringement harms, which constitute violations of intellectual property rights under applicable law. The lawsuits and settlements confirm that harm has occurred. Although the article also discusses the companies' efforts to comply with legal rulings and the transformative nature of AI training, the core event is the unauthorized use of copyrighted material causing harm to authors and publishers. This fits the definition of an AI Incident because the AI systems' development and use directly led to a breach of intellectual property rights.
Thumbnail Image

Washington Post: "Inside an AI Start-Up's Plan to Scan and Dispose of Millions of Books

2026-01-28
LJ infoDOCKET
Why's our monitor labelling this an incident or hazard?
Anthropic's Project Panama involved acquiring and destructively scanning millions of books to train AI models, which is an AI system development activity. The unauthorized use of copyrighted books for training constitutes a breach of intellectual property rights, a recognized harm under the AI Incident framework. The legal action and settlement further confirm that harm occurred. Hence, this event meets the criteria for an AI Incident because the AI system's development directly led to a violation of rights.
Thumbnail Image

Start Up No.2597: how Anthropic got its book haul, what sort of AI bubble is this?, China's biotech advances, and more

2026-01-28
The Overspill: when there's more that I want to say
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (Anthropic's AI models) developed and trained using scanned books, which is explicitly described. The legal case and settlement indicate that the AI system's development and use directly led to a violation of intellectual property rights, a recognized harm under the framework. The court rulings clarify the legality of transformative use but also highlight infringement due to pirated book downloads. This constitutes a direct AI Incident as the AI system's development and use caused a breach of copyright law. The article does not merely discuss potential harm or general AI ecosystem updates but reports on a concrete legal incident involving AI and harm to authors' rights.
Thumbnail Image

Inside an AI start-up's plan to scan and dispose of millions of books

2026-01-28
DNYUZ
Why's our monitor labelling this an incident or hazard?
The event explicitly involves AI systems (large language models) trained on vast datasets of books. The development and use of these AI systems directly led to copyright infringement harms, as the companies acquired and used copyrighted works without permission, including pirated copies. This constitutes a violation of intellectual property rights, a recognized harm under the AI Incident definition. The legal cases and settlement confirm that harm occurred. Although the AI models' training process was found transformative, the acquisition methods caused harm. Hence, this is an AI Incident due to realized harm linked to AI system development and use.
Thumbnail Image

The quest to 'destructively scan' all the world's books - The Washington Post

2026-01-29
Washington Post
Why's our monitor labelling this an incident or hazard?
Anthropic's AI system was trained using a massive dataset obtained by destructively scanning books, which led to a copyright infringement lawsuit by authors. The legal action and settlement demonstrate that the AI system's development and use directly caused a breach of intellectual property rights, a form of harm under the AI Incident definition. The involvement of AI in the development and use of the system is explicit, and the harm (violation of rights) has materialized, making this an AI Incident rather than a hazard or complementary information.
Thumbnail Image

Book Reviews News | Slashdot

2026-01-29
Slashdot
Why's our monitor labelling this an incident or hazard?
The AI system (Claude chatbot) was trained using scanned books obtained through a project that involved buying and physically scanning books, as well as downloading pirated books. This use of copyrighted material without proper authorization breaches intellectual property rights, which is a harm under the AI Incident definition (c). The lawsuit and settlement confirm that harm has occurred. Therefore, this event qualifies as an AI Incident due to the violation of intellectual property rights caused by the AI system's development and use.