
The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.
Microsoft published and later deleted a blog post instructing developers to train AI models using pirated copies of the Harry Potter books, sourced from a mislabeled Kaggle dataset. The incident, involving a senior product manager, led to copyright infringement concerns and highlighted ethical issues in AI training practices.[AI generated]
Why's our monitor labelling this an incident or hazard?
The event explicitly involves the use of AI systems (LLMs) trained on pirated copyrighted material (Harry Potter books) to generate AI outputs, including fan fiction and Q&A systems. This use constitutes a violation of intellectual property rights, a recognized harm under the AI Incident framework. Microsoft's blog post encouraged this use and linked to the infringing dataset, making the AI system's development and use a direct factor in the harm. The removal of the blog post is a response but does not negate the fact that the incident occurred. Hence, the event meets the criteria for an AI Incident rather than a hazard or complementary information.[AI generated]