Meta and Mistral AI Cofounder Implicated in Massive Copyright Infringement for AI Training

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Guillaume Lample, cofounder of Mistral AI and former Meta researcher, is accused of orchestrating the use of millions of pirated books from sites like LibGen to train Meta's LLaMA AI models. Internal documents and legal proceedings confirm large-scale copyright violations, raising concerns about AI training practices.[AI generated]

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions the use of unauthorized copyrighted data (Library Genesis) for training AI models, which is a violation of intellectual property rights, a recognized harm under the AI Incident definition. The involvement of AI systems in this infringement is direct, as the data was used to develop and deploy AI language models. The harm is realized, not just potential, as the data was downloaded and used in model training. Hence, this event meets the criteria for an AI Incident rather than a hazard or complementary information.[AI generated]
AI principles
AccountabilityTransparency & explainabilityPrivacy & data governance

Industries
Media, social platforms, and marketing

Affected stakeholders
Business

Harm types
Economic/Property

Severity
AI incident

Business function:
Research and development

AI system task:
Content generation


Articles about this incident or hazard

Thumbnail Image

"Tout le monde le fait, donc nous aussi" : les méthodes pirates du cofondateur de Mistral AI dévoilées

2025-12-24
Clubic.com
Why's our monitor labelling this an incident or hazard?
The article explicitly mentions the use of unauthorized copyrighted data (Library Genesis) for training AI models, which is a violation of intellectual property rights, a recognized harm under the AI Incident definition. The involvement of AI systems in this infringement is direct, as the data was used to develop and deploy AI language models. The harm is realized, not just potential, as the data was downloaded and used in model training. Hence, this event meets the criteria for an AI Incident rather than a hazard or complementary information.
Thumbnail Image

Mistral AI : la licorne française accusée d'avoir volé 70 To de livres protégés pour nourrir son IA

2025-12-23
Les Numériques
Why's our monitor labelling this an incident or hazard?
The event involves the development and use of AI systems trained on illegally obtained copyrighted materials, which is a clear violation of intellectual property rights. This harm has already occurred as the AI models have been trained and deployed using these datasets. The involvement of AI systems is explicit, and the harm is directly linked to the AI system's development. Therefore, this qualifies as an AI Incident under the framework, specifically under category (c) violations of human rights or breach of obligations under applicable law protecting intellectual property rights.
Thumbnail Image

Mistral AI : son cofondateur au cœur d'un scandale de piratage massif chez Meta

2025-12-24
Génération-NT
Why's our monitor labelling this an incident or hazard?
The event explicitly involves AI systems (large language models like LLaMA and potentially Mistral 7B) whose training data was sourced through illegal means, violating intellectual property rights. The involvement of the AI system's development (data sourcing) directly led to a breach of legal protections for authors, which is a recognized form of AI harm under the framework. The article documents realized harm (copyright infringement) and potential ongoing harm (legal risks to Mistral AI). Hence, this is an AI Incident rather than a hazard or complementary information.
Thumbnail Image

Selon Mediapart, Meta aurait utilisé des millions de livres piratés pour entraîner ses modèles d'IA - Siècle Digital

2025-12-24
Siècle Digital
Why's our monitor labelling this an incident or hazard?
The event involves the development of an AI system (LLaMA) using pirated copyrighted books, which is a clear breach of intellectual property rights, a category of harm under the AI Incident definition. The use of unauthorized data for training is directly linked to the AI system's development and deployment. The article provides evidence from internal documents and legal proceedings, confirming the occurrence of this harm. Hence, this is not merely a potential risk or complementary information but a realized AI Incident involving legal rights violations.
Thumbnail Image

Mistral AI : son cofondateur accusé d'avoir participé à un piratage massif

2025-12-23
Le Jour Guinée, actualités des banques en ligne
Why's our monitor labelling this an incident or hazard?
The event explicitly involves AI systems (large language models) trained on pirated copyrighted content, which is a direct violation of intellectual property rights, a recognized harm under the AI Incident definition. The involvement of Guillaume Lample, a key AI researcher, in the unauthorized data acquisition for AI training, and the ongoing legal proceedings confirm the realized harm. Although the article speculates about possible future harm at Mistral AI, the primary incident concerns the actual past misuse at Meta. Hence, this is classified as an AI Incident rather than a hazard or complementary information.
Thumbnail Image

1

2025-12-23
Next
Why's our monitor labelling this an incident or hazard?
The event explicitly involves AI systems (large language models like Llama) trained using unauthorized copyrighted content from Libgen, which is a violation of intellectual property rights. The involvement of Guillaume Lample and Meta in knowingly using this data confirms the development and use of AI systems in a way that breaches legal protections. Since the harm (violation of intellectual property rights) has already occurred through the training and deployment of these models, this qualifies as an AI Incident under the framework.