AI Training Dataset Violates Children's Privacy

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Human Rights Watch reports that photos of Australian children, including indigenous and identifiable ones, have been used without consent in the LAION-5B AI training dataset. This raises privacy concerns and potential legal breaches, prompting calls for urgent legal reforms to protect children's rights and prevent misuse of AI tools.[AI generated]

Why's our monitor labelling this an incident or hazard?

Researchers from Human Rights Watch found that the LAION-5B dataset, used to train widely deployed AI models (e.g., Stable Diffusion, Midjourney), scraped private and semi-private images of minors without consent, including detailed personal metadata. This data misuse is already realized, breaches fundamental privacy and human rights, and poses actual harms (e.g., potential for hyper-realistic deepfakes of children). Thus it meets the criteria for an AI Incident.[AI generated]

AI principles

Privacy & data governanceRespect of human rightsTransparency & explainabilityAccountabilityFairnessHuman wellbeing

Industries

Media, social platforms, and marketingGovernment, security, and defence

Affected stakeholders

Children

Harm types

Human or fundamental rights

Severity

AI incident

Business function:

Research and development

AI system task:

Content generationRecognition/object detection

Articles about this incident or hazard

Thumbnail Image

AI Tools Trained With Photos of Australian Babies, Children Without Consent

2024-07-03

www.theepochtimes.com

Why's our monitor labelling this an incident or hazard?

The article reports on the ongoing non-consensual collection of children’s images for AI training and highlights the credible risk of future harm through deepfake generation. There is no report of an actual exploitation incident, and the focus is on the potential misuse and need for regulatory action. Therefore, this scenario constitutes an AI Hazard.

Thumbnail Image

The personal images behind the public content - how AI is breaching your privacy

2024-07-02

Australian Broadcasting Corporation

Why's our monitor labelling this an incident or hazard?

Researchers from Human Rights Watch found that the LAION-5B dataset, used to train widely deployed AI models (e.g., Stable Diffusion, Midjourney), scraped private and semi-private images of minors without consent, including detailed personal metadata. This data misuse is already realized, breaches fundamental privacy and human rights, and poses actual harms (e.g., potential for hyper-realistic deepfakes of children). Thus it meets the criteria for an AI Incident.

Thumbnail Image

Children's personal photos used to train AI models without consent: Report

2024-07-03

The Hindu

Why's our monitor labelling this an incident or hazard?

The report describes an AI system (LAION-5B) trained on scraped personal images of children without consent, directly infringing on their privacy and fundamental rights. This constitutes realized harm (unauthorized use of personal data), categorizing it as an AI Incident.

Thumbnail Image

HRW Finds Australian Children's Photos Misused in AI Training, Raises Privacy Concerns - World news - Tasnim News Agency

2024-07-03

خبرگزاری تسنیم

Why's our monitor labelling this an incident or hazard?

This describes a realized privacy violation through AI dataset scraping—development/use of an AI training dataset that has directly infringed on children's privacy rights—constituting an AI Incident.

Thumbnail Image

Photos of Australian kids have been found in a massive AI training data set. What can we do?

2024-07-03

The Conversation

Why's our monitor labelling this an incident or hazard?

The unauthorized collection and use of minors’ images in the training of AI systems is a direct violation of privacy and data protection laws, representing a breach of human rights. The harm has already occurred through the deployment of these images in AI, so this is classified as an AI Incident.

Thumbnail Image

AI trains on kids' photos even when parents use strict privacy settings

2024-07-02

Ars Technica

Why's our monitor labelling this an incident or hazard?

The article describes an AI system (image generators trained on LAION-5B) that directly led to harms—unauthorized use of children’s images, privacy breaches, and actual circulation of sexually explicit deepfakes of minors—thus meeting the criteria for an AI Incident.

Thumbnail Image

Photos of Aussie kids scraped, used by popular AI tool

2024-07-03

Perth Now

Why's our monitor labelling this an incident or hazard?

The LAION-5B dataset—a training corpus for generative AI tools like Midjourney and Stable Diffusion—was found to include almost 200 images of Australian children scraped without consent from online sources, including unlisted school websites and YouTube videos. This unauthorized data collection and subsequent use in AI training constitutes a direct violation of privacy and children’s rights, with clear potential for deepfake creation and other harms. These are realized human rights breaches stemming from the AI system’s development and use.

Thumbnail Image

Photos of Australian children used in dataset to train AI, human rights group says

2024-07-02

Yahoo! Finance

Why's our monitor labelling this an incident or hazard?

The unauthorized inclusion of identifiable children’s images in the Laion-5B dataset is a direct violation of privacy and fundamental rights. The dataset’s use in free deepfake/nudify apps has already generated non-consensual sexualized images of minors, demonstrating real harm to individuals. Therefore, this event is an AI Incident involving human rights violations facilitated by AI.

Thumbnail Image

Photos of Australian children used in dataset to train AI, human rights group says

2024-07-02

The Guardian

Why's our monitor labelling this an incident or hazard?

The unauthorized scraping and use of children’s personal images in an AI training dataset constitutes a violation of privacy and human rights (right to personal data protection). This is an actual harm enabled by the development and use of AI systems, fitting the definition of an AI Incident under violations of human rights.

Thumbnail Image

Creeps: AI Giants Are Training Systems on Pictures of Children Without Consent

2024-07-03

Breitbart

Why's our monitor labelling this an incident or hazard?

Human Rights Watch’s findings show that AI developers have harvested thousands of images of children without consent and used them to train generative models. This unauthorized use of personal data is an infringement of privacy and can lead to real-world harms (identity exposure, deepfake abuse). The event describes actual past and ongoing misuse leading to rights violations, fitting the definition of an AI Incident.

Thumbnail Image

Photos of Australian children found in AI training dataset, create deepfake risk | Biometric Update

2024-07-03

Biometric Update

Why's our monitor labelling this an incident or hazard?

The event explicitly involves AI systems trained on a dataset containing personal images of children collected without consent, which were then used by generative AI models to create synthetic images that could be child pornography. This is a direct violation of rights and causes harm to children and their communities. The involvement of AI in generating harmful content and the unauthorized use of personal data meets the criteria for an AI Incident under violations of human rights and harm to communities. The event describes realized harm, not just potential risk, and thus is classified as an AI Incident.

Thumbnail Image

Report says photos of kids posted online, even with privacy settings, are being used to train AI

2024-07-03

KSBY

Why's our monitor labelling this an incident or hazard?

The involvement of AI systems is clear as the LAION-5B dataset is used to train AI models. The scraping of images without consent, including those protected by privacy settings, constitutes a violation of rights (privacy and potentially intellectual property). The potential for misuse, such as generating deepfakes of children, represents a significant harm to individuals and communities. Since these harms have already occurred (privacy breaches) and the potential for further harm (deepfakes) is realized, this qualifies as an AI Incident under the framework.

Thumbnail Image

Australian Kids' Photos Found in Major AI Training Set

2024-07-03

Mirage News

Why's our monitor labelling this an incident or hazard?

The event involves the use of an AI system (generative AI models trained on LAION-5B) whose development and use have directly led to violations of privacy rights of Australian children, a breach of applicable law protecting fundamental rights. The presence of children's photos in the training data without consent constitutes a violation of privacy laws, which is a form of harm to individuals. The article discusses the legal implications and enforcement challenges, confirming that harm has occurred. This fits the definition of an AI Incident because the AI system's development and use have directly led to a breach of obligations under applicable law intended to protect fundamental rights (privacy).

Thumbnail Image

Photos of Australian children used illicitly to train AI tools, HRW reports

2024-07-04

JURIST

Why's our monitor labelling this an incident or hazard?

The report explicitly states that AI tools trained on these illicitly obtained images have been used to generate harmful deepfakes, including explicit imagery of children, which is a direct violation of human rights and privacy. The AI system's development and use have caused realized harm, including exploitation and potential psychological harm to children and their families. The involvement of AI in generating manipulated content that harms individuals and communities meets the criteria for an AI Incident rather than a hazard or complementary information.

Thumbnail Image

HRW Reveals AI Sector's Unauthorized Use Of Australian Children's Photos For Model Training

2024-07-03

International Business Times AU

Why's our monitor labelling this an incident or hazard?

The event involves the use of an AI system (advanced AI models trained on LAION-5B) whose development involved unauthorized use of personal data of children, constituting a violation of privacy rights (a breach of obligations under applicable law protecting fundamental rights). The harm is realized, not just potential, as the dataset contains identifiable children's photos and personal information used without consent. This fits the definition of an AI Incident because the AI system's development and use directly led to a violation of human rights/privacy. The event is not merely a potential hazard or complementary information, but a clear case of harm caused by AI development practices.

Thumbnail Image

AI Giants are Training Systems on Pictures of Children Without Consent - The Minnesota Sun

2024-07-03

The Minnesota Sun

Why's our monitor labelling this an incident or hazard?

The article describes AI systems being trained on images of children without consent, which is a breach of privacy and potentially other rights. The presence of these images in training datasets is a direct involvement of AI development leading to violations of rights. The potential for creating realistic deepfakes of children further amplifies the risk of harm to these individuals and communities. Since the use of these images has already happened, this is an AI Incident rather than a mere hazard or complementary information.

Thumbnail Image

Photos of Australian children found in AI training dataset, create deepfake risk

2024-07-03

freedomsphoenix.com

Why's our monitor labelling this an incident or hazard?

The AI system involvement is clear as the images were used to train generative AI models. The use of images without consent constitutes a violation of rights, specifically privacy and potentially intellectual property rights. The use of these images in generative AI services creates a plausible risk of harm through deepfakes, which can cause harm to the children depicted and their communities. Since the harm is either occurring (privacy violation) or plausibly could lead to significant harm (deepfake risk), this qualifies as an AI Incident due to the realized violation of rights and potential for further harm.

Thumbnail Image

No matter how careful you are, AI will use your photos to train itself - Softonic

2024-07-04

Softonic

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions that photos of minors were used without consent to train AI models, which is a violation of privacy and potentially other legal protections for children. The AI system's development process (training on these images) directly led to this harm. The harm is realized, not just potential, as the images are already in the training dataset and identifiable information has been traced. This fits the definition of an AI Incident under violations of human rights or breach of applicable law protecting fundamental rights.

Thumbnail Image

Human Rights Watch finds photos of Aussie kids in AI training dataset

2024-07-03

cyberdaily.au

Why's our monitor labelling this an incident or hazard?

The event explicitly involves an AI system (AI tools trained on the LAION-5B dataset) whose development and use have directly led to violations of children's rights and privacy, including the creation of harmful content. The scraping and use of children's images without consent breaches fundamental rights and legal protections. The report documents actual harm and misuse, not just potential risks. Hence, this is an AI Incident under the framework, as it involves realized harm linked to AI system development and use causing violations of rights.