Microsoft Blog Promotes AI Training on Pirated Harry Potter Books, Sparks Copyright Backlash

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Microsoft published and later deleted a blog post instructing developers to train AI models using pirated copies of the Harry Potter books, sourced from a mislabeled Kaggle dataset. The incident, involving a senior product manager, led to copyright infringement concerns and highlighted ethical issues in AI training practices.[AI generated]

Why's our monitor labelling this an incident or hazard?

The event explicitly involves the use of AI systems (LLMs) trained on pirated copyrighted material (Harry Potter books) to generate AI outputs, including fan fiction and Q&A systems. This use constitutes a violation of intellectual property rights, a recognized harm under the AI Incident framework. Microsoft's blog post encouraged this use and linked to the infringing dataset, making the AI system's development and use a direct factor in the harm. The removal of the blog post is a response but does not negate the fact that the incident occurred. Hence, the event meets the criteria for an AI Incident rather than a hazard or complementary information.[AI generated]
AI principles
AccountabilityPrivacy & data governance

Industries
Media, social platforms, and marketing

Affected stakeholders
BusinessOther

Harm types
Economic/Property

Severity
AI incident

Business function:
Research and development

AI system task:
Content generation


Articles about this incident or hazard

Thumbnail Image

Microsoft removes guide on how to train LLMs on pirated Harry Potter books

2026-02-20
Ars Technica
Why's our monitor labelling this an incident or hazard?
The event explicitly involves the use of AI systems (LLMs) trained on pirated copyrighted material (Harry Potter books) to generate AI outputs, including fan fiction and Q&A systems. This use constitutes a violation of intellectual property rights, a recognized harm under the AI Incident framework. Microsoft's blog post encouraged this use and linked to the infringing dataset, making the AI system's development and use a direct factor in the harm. The removal of the blog post is a response but does not negate the fact that the incident occurred. Hence, the event meets the criteria for an AI Incident rather than a hazard or complementary information.
Thumbnail Image

Accio Lawyers! Microsoft manager trained AI on pirated Potter books

2026-02-20
PCWorld
Why's our monitor labelling this an incident or hazard?
The event explicitly involves an AI system (a generative AI trained on text data) and the use of pirated copyrighted material for training, which is a violation of intellectual property rights. The harm (copyright infringement) has already occurred as the AI was trained on unauthorized data and publicly promoted. This fits the definition of an AI Incident under violations of intellectual property rights. Although the blog post and dataset have been removed, the incident of unauthorized use and promotion of pirated content for AI training has already taken place, constituting realized harm.
Thumbnail Image

Microsoft blogger suggests you train AI on pirated Harry Potter

2026-02-21
Boing Boing
Why's our monitor labelling this an incident or hazard?
The event explicitly involves the use of an AI system trained on pirated copyrighted material, which constitutes a violation of intellectual property rights, a breach of applicable law protecting such rights. The AI system's development and use in this context directly relate to the unauthorized use of copyrighted content. Although no physical harm or injury is described, the violation of intellectual property rights is a recognized harm under the framework. Therefore, this qualifies as an AI Incident due to the realized harm of copyright infringement through AI training and use.
Thumbnail Image

Microsoft's AI Tutorial Told Users to Train Models on Pirated Harry Potter Books -- Then Quietly Vanished

2026-02-21
WebProNews
Why's our monitor labelling this an incident or hazard?
The event explicitly involves AI system development (training AI models) and the use of pirated copyrighted material, which constitutes a violation of intellectual property rights. The tutorial's publication and circulation directly led to harm by promoting illegal activity related to AI training data. This meets the criteria for an AI Incident because the AI system's development process is implicated in a breach of legal and ethical standards protecting intellectual property. The incident is not merely a potential risk or complementary information but a realized harm linked to AI system development practices.
Thumbnail Image

Book Reviews News | Slashdot

2026-02-21
Slashdot
Why's our monitor labelling this an incident or hazard?
The event describes the use of an AI system trained on copyrighted content without permission, which is a breach of intellectual property rights. This harm has materialized because the AI models were trained and promoted using the copyrighted Harry Potter books. Therefore, this qualifies as an AI Incident due to violation of intellectual property rights caused by the AI system's development and use.