OpenAI Faces Scrutiny Over Sora AI Video Generator's Data Transparency and Safety

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

OpenAI's upcoming text-to-video AI, Sora, is under legal and ethical scrutiny due to the CTO's vague responses about its training data sources, raising concerns over copyright and privacy compliance. Sora remains in testing, with public release planned for 2024, pending safety and transparency assurances.[AI generated]

Why's our monitor labelling this an incident or hazard?

Sora is an AI system for text-to-video generation. The article does not report any realized harm caused by Sora yet but outlines credible concerns and potential harms such as non-consensual deepfake pornography, misinformation, and cybercrime that could plausibly arise from its use. The discussion of public opinion and expert warnings about the need for regulation and safeguards further supports the classification as a potential risk. Therefore, this event fits the definition of an AI Hazard, as it plausibly could lead to AI Incidents involving violations of rights, harm to communities, and other significant harms if misused or unregulated.[AI generated]
AI principles
Privacy & data governanceTransparency & explainabilityAccountabilityRespect of human rights

Industries
Media, social platforms, and marketing

Affected stakeholders
BusinessGeneral public

Harm types
Human or fundamental rightsEconomic/Property

Severity
AI hazard

Business function:
Research and development

AI system task:
Content generation


Articles about this incident or hazard

Thumbnail Image

OpenAI再陷巨大争议?Sora训练数据被质疑非法,CTO采访疯狂翻车

2024-03-16
凤凰网(凤凰新媒体)
Why's our monitor labelling this an incident or hazard?
The article explicitly mentions that OpenAI used copyrighted works without permission to train its AI models, leading to lawsuits from authors and the New York Times. This is a clear violation of intellectual property rights, which falls under harm category (c). The AI system's development and use directly led to this harm. The controversy and legal challenges are ongoing, indicating realized harm rather than just potential risk. Hence, this is an AI Incident rather than a hazard or complementary information.
Thumbnail Image

OpenAI再陷巨大争议?Sora训练数据被质疑非法,CTO采访疯狂翻车

2024-03-16
凤凰网(凤凰新媒体)
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (Sora) and discusses its training data, which is alleged to include copyrighted materials used without proper authorization. The CTO's inability to clearly confirm data sources and the mention of ongoing lawsuits indicate that the AI system's development and use have directly led to legal disputes over intellectual property rights violations. This fits the definition of an AI Incident, as there is a breach of obligations under applicable law protecting intellectual property rights caused by the AI system's development and use. The harm is realized (lawsuits, potential financial penalties, reputational damage), not merely potential, so it is not an AI Hazard. The article is not merely complementary information because it centers on the controversy and potential legal harm, not just updates or responses.
Thumbnail Image

OpenAI再陷巨大争议?Sora训练数据被质疑非法,CTO采访疯狂翻车

2024-03-16
华尔街见闻
Why's our monitor labelling this an incident or hazard?
The event involves the use of AI systems (Sora, an AI video generation model) and centers on the development and use of these systems with training data that may have been obtained or used illegally, leading to violations of intellectual property rights. The ongoing lawsuits and the CTO's inability to clearly confirm the legality of the data usage indicate that harm has already occurred or is occurring, specifically violations of intellectual property rights (a form of harm under the AI Incident definition). Therefore, this qualifies as an AI Incident due to the direct link between the AI system's development/use and the legal and ethical harms arising from unauthorized data use.
Thumbnail Image

OpenAI 首席技术官:不确定 Sora 的训练数据来自哪里

2024-03-18
163.com
Why's our monitor labelling this an incident or hazard?
The article focuses on the ambiguity and controversy around the training data used for an AI system, without describing any direct or indirect harm caused by the AI system's development or use. There is no indication of injury, rights violations, operational disruption, or other harms occurring or imminent. The discussion is about transparency and potential legal risks, which is a governance and societal response topic. Therefore, this qualifies as Complementary Information rather than an AI Incident or AI Hazard.
Thumbnail Image

OpenAI再陷巨大争议?Sora训练数据被质疑非法 CTO采访疯狂翻车 - cnBeta.COM 移动版

2024-03-16
cnBeta.COM
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system, Sora, which is trained on data that is under dispute for copyright infringement. The CTO's evasive answers and the ongoing lawsuits indicate that the AI system's development and use have directly led to violations of intellectual property rights, a breach of applicable law protecting intellectual property rights. This constitutes harm under the AI Incident definition (c). The article describes realized harm (lawsuits, legal disputes) rather than just potential harm, so it is not merely a hazard. The detailed discussion of the controversy and legal challenges confirms the classification as an AI Incident rather than complementary information or unrelated news.
Thumbnail Image

Shall we go back to square one? The CTO of OpenAI claims to not know what data Sora has been trained with - Softonic

2024-03-18
Softonic
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (Sora) and its development, specifically the data used for training. While there are concerns about the data sources, no direct or indirect harm has been reported or can be reasonably inferred from the article. The article focuses on the CTO's statements and the transparency issue, which is informative but does not describe an AI Incident or AI Hazard. Therefore, this is best classified as Complementary Information, as it provides context and insight into AI development practices and transparency without describing a specific harm or plausible future harm.
Thumbnail Image

'They Stole Everything:' Elon Musk's War With OpenAI Heats Up After Sora's Data Sourcing Doubts Emerge In CTO Mira Murati's Interview By Benzinga

2024-03-15
Investing.com UK
Why's our monitor labelling this an incident or hazard?
While the article involves AI systems (OpenAI's models) and touches on data sourcing ethics and legal disputes, it does not report any realized harm (such as violations of rights, health harm, or disruption) or a credible plausible risk of harm stemming from AI system use or malfunction. The content mainly provides context on ongoing controversies and legal actions, which fits the category of Complementary Information rather than an Incident or Hazard.
Thumbnail Image

An OpenAI mystery: YouTube videos, Google throttling, and AI training data.

2024-03-18
Business Insider
Why's our monitor labelling this an incident or hazard?
The article centers on the practice of acquiring large volumes of online content, including YouTube videos, for AI training by OpenAI, with potential implications for intellectual property rights and terms of service compliance. However, it does not document any actual harm, legal rulings, or incidents resulting from this practice. The concerns are about possible legal and ethical issues and the lack of transparency, which constitute a broader ecosystem context rather than a direct or indirect harm event. Therefore, it fits the definition of Complementary Information, as it enhances understanding of AI data sourcing challenges and ongoing debates without describing a specific AI Incident or AI Hazard.
Thumbnail Image

OpenAI's video generator Sora might allow nudity. Experts are worried

2024-03-17
Quartz
Why's our monitor labelling this an incident or hazard?
Sora is an AI system for text-to-video generation. The article does not report any realized harm caused by Sora yet but outlines credible concerns and potential harms such as non-consensual deepfake pornography, misinformation, and cybercrime that could plausibly arise from its use. The discussion of public opinion and expert warnings about the need for regulation and safeguards further supports the classification as a potential risk. Therefore, this event fits the definition of an AI Hazard, as it plausibly could lead to AI Incidents involving violations of rights, harm to communities, and other significant harms if misused or unregulated.
Thumbnail Image

OpenAI's CTO Dodges Questions on Sora AI's Training Data Origins - EconoTimes

2024-03-17
EconoTimes
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (Sora, a generative AI video model) and concerns about its training data, specifically the use of copyrighted material without explicit permission. This raises potential violations of intellectual property rights, which is a recognized harm category. However, the article does not describe any actual incident of harm or legal violation having occurred; it mainly reports on the CTO's guarded stance and the ethical debate. Therefore, this situation represents a plausible risk of harm related to AI development and use, fitting the definition of an AI Hazard rather than an AI Incident or Complementary Information. It is not unrelated because it clearly involves an AI system and potential harm.
Thumbnail Image

Is the OpenAI Controversy Deepening Amidst Sora's Data Sourcing Doubts? Elon Musk Raises Questions | Cryptopolitan

2024-03-15
Cryptopolitan
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (OpenAI's Sora) and concerns about its data sourcing practices, which relate to the development phase of the AI system. However, the article does not describe any actual harm or violations that have occurred due to these practices. Instead, it focuses on ethical concerns, disputes, and potential regulatory implications. Therefore, this situation fits the definition of an AI Hazard, as the development and use of the AI system could plausibly lead to incidents involving ethical breaches or violations of rights if the data sourcing issues are not resolved.
Thumbnail Image

Is OpenAI's Sora Trained on YouTube Videos? A Question of Ethics and Licensing

2024-03-18
CineD
Why's our monitor labelling this an incident or hazard?
The article centers on the potential misuse of copyrighted video content in training an AI system, which could plausibly lead to violations of intellectual property rights (an AI Incident category). However, since no confirmed harm or legal action has occurred yet, and the concerns are about possible future consequences, this fits the definition of an AI Hazard. The AI system (Sora) is explicitly involved, and the discussion is about the plausible future harm related to copyright infringement and licensing issues. Therefore, the event is best classified as an AI Hazard.
Thumbnail Image

OpenAI's Sora Model Raises Questions on Data Sources

2024-03-16
COINTURK NEWS
Why's our monitor labelling this an incident or hazard?
The article explicitly mentions lawsuits against OpenAI and Microsoft alleging copyright infringement and unauthorized data use in training AI models, which are violations of intellectual property and privacy rights. These harms have already materialized as legal actions and public criticism. The Sora model's unclear data sourcing raises further concerns about compliance and transparency. Since the AI systems' development and use have directly led to these harms, this qualifies as an AI Incident under the framework, specifically under violations of human rights or breach of legal obligations protecting intellectual property and privacy rights.
Thumbnail Image

Have We Reached Peak AI?

2024-03-18
wheresyoured.at
Why's our monitor labelling this an incident or hazard?
The article does not report any concrete AI Incident or AI Hazard. There is no description of realized harm or a specific event where AI caused or could plausibly cause injury, rights violations, infrastructure disruption, or other significant harms. The concerns about potential legal challenges related to copyright and the economic risks of an AI bubble are speculative and forward-looking, fitting the nature of Complementary Information. The piece mainly critiques the communication and hype around AI, providing broader context and analysis rather than reporting a new incident or hazard. Therefore, the appropriate classification is Complementary Information.
Thumbnail Image

OpenAI's Mira Murati is "not sure" where Sora's training data comes from

2024-03-16
TradingView
Why's our monitor labelling this an incident or hazard?
The article centers on OpenAI's CTO expressing uncertainty about the exact data sources used to train Sora and mentions lawsuits alleging copyright infringement and privacy violations related to AI training data. While these lawsuits imply potential or ongoing legal harm, the article does not describe a concrete AI Incident such as a breach of rights or harm caused by the AI system's outputs or malfunction. Nor does it describe a plausible future harm scenario distinct from these ongoing legal disputes. Therefore, the content fits best as Complementary Information, providing context and updates on AI development, legal challenges, and governance issues rather than reporting a new AI Incident or AI Hazard.
Thumbnail Image

Murati, de OpenAI, "no está segura" de dónde proceden los datos de entrenamiento de Sora

2024-03-16
Cointelegraph
Why's our monitor labelling this an incident or hazard?
The article centers on the unclear provenance of training data for an AI system and mentions lawsuits alleging copyright violations and privacy breaches related to AI training data. While these lawsuits indicate possible violations of intellectual property and privacy rights, the article does not describe a concrete AI Incident where harm has been directly or indirectly caused by the AI system's development or use. Nor does it present a new plausible future harm scenario beyond existing legal disputes. The main focus is on transparency issues and legal challenges, which constitute complementary information about the AI ecosystem and ongoing governance and societal responses rather than a new AI Incident or AI Hazard.
Thumbnail Image

Sora IA: ¿cuándo lanzarán la aplicación que hace videos realistas?

2024-03-18
El Tiempo
Why's our monitor labelling this an incident or hazard?
Sora is an AI system capable of generating realistic videos, which can plausibly lead to harms such as misinformation, privacy violations, or other adverse effects if misused. The article states that the release is postponed to address these risks, indicating awareness of potential hazards. Since no harm has yet occurred, and the focus is on preventing possible future harms, this event qualifies as an AI Hazard rather than an Incident or Complementary Information.
Thumbnail Image

El generador de vídeos de OpenAI, Sora, podría permitir desnudos. Los expertos están preocupados - Notiulti

2024-03-17
Notiulti
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (Sora) that is not yet publicly released but is expected to be soon. The article focuses on the plausible future harms that could arise from its use, such as generating deepfake pornography, misinformation, and cybercrime, which are credible risks consistent with AI capabilities. Since no actual harm has been reported yet, but the potential for significant harm is clearly articulated, this qualifies as an AI Hazard. The article also includes complementary information about public opinion and regulatory needs, but the primary focus is on the plausible risks posed by Sora's release.
Thumbnail Image

Today's Cache | EU's AI Rules get lawmaker approval; OpenAI's Sora to become available this year; U.S. moves closer towards TikTok ban

2024-03-14
The Hindu
Why's our monitor labelling this an incident or hazard?
The article does not describe any realized harm or direct/indirect AI-related incident. The EU AI Act approval is a governance response aimed at mitigating AI risks, thus fitting Complementary Information. OpenAI's Sora release is a product announcement without mention of harm or plausible harm. The TikTok ban discussion relates to data security and algorithmic concerns but does not specify an AI Incident or Hazard; it is a policy development reflecting societal/governance response. Therefore, the article is best classified as Complementary Information.
Thumbnail Image

Is OpenAI's Viral AI Video Generator Sora Trained On YouTube And Instagram? CTO Mira Murati Is 'Not Sure' By Benzinga

2024-03-14
Investing.com UK
Why's our monitor labelling this an incident or hazard?
The article discusses OpenAI's data sourcing practices for training an AI system (Sora), highlighting regulatory findings and legal concerns about data protection and privacy law violations. While these concerns imply potential harm related to privacy and data rights, the article does not report a concrete incident where harm has directly or indirectly occurred due to the AI system's use or malfunction. Instead, it reports on ongoing investigations, regulatory decisions, and public scrutiny, which are responses to broader AI ecosystem issues. Thus, it fits the definition of Complementary Information rather than an AI Incident or AI Hazard.
Thumbnail Image

Council Post: Deepfakes And The Erosion Of Digital Trust: Zero-Trust Strategies In The Age Of AI-Generated Content

2024-03-14
Forbes
Why's our monitor labelling this an incident or hazard?
The article centers on the potential risks and future harms posed by the AI system Sora, which can generate highly realistic deepfake videos. It does not describe any actual incident of harm or misuse but warns about the credible threat of misinformation and cybersecurity risks that could arise from the use of such AI-generated content. The discussion of zero-trust strategies is a recommended response to these plausible threats. Therefore, the event qualifies as an AI Hazard because it involves an AI system whose use could plausibly lead to significant harms such as misinformation and security breaches, but no realized harm is reported in the article.
Thumbnail Image

OpenAI's Mira Murati Quizzed About Data Used To Train Sora AI Model: Here's What She Said - News18

2024-03-15
News18
Why's our monitor labelling this an incident or hazard?
The article centers on questions about data transparency and ethical concerns related to AI training data, without describing any realized harm or direct incident involving the AI system. There is no indication of injury, rights violations, or other harms caused by the AI's development or use. The concerns raised are about potential legitimacy and credibility issues, which are important but do not constitute a direct or indirect AI Incident or a clear AI Hazard. Therefore, this is best classified as Complementary Information, providing context and societal response to AI development practices.
Thumbnail Image

OpenAI's Sora will one day add audio, editing, and may allow nudity in content

2024-03-13
TechRadar
Why's our monitor labelling this an incident or hazard?
The article describes an AI system under development with planned safety guardrails to prevent misuse and misinformation. It mentions potential future features and acknowledges current errors in output but does not report any realized harm or incidents caused by the AI. The discussion of possible misuse and the company's efforts to mitigate risks indicate plausible future harm, but no direct or indirect harm has materialized. Therefore, this event qualifies as an AI Hazard, as the AI system's use could plausibly lead to harm if not properly managed, but no incident has yet occurred.
Thumbnail Image

How OpenAI's text-to-video tool Sora could change science - and society

2024-03-12
Nature
Why's our monitor labelling this an incident or hazard?
The article explicitly mentions the AI system Sora and its ability to generate photorealistic videos from text prompts, confirming AI system involvement. It focuses on the potential misuse of this technology to create misinformation and fake videos that could disrupt political processes and societal trust, which are harms to communities and potentially violations of rights. However, the article does not report any actual incidents of harm caused by Sora but rather concerns and warnings about plausible future misuse. This aligns with the definition of an AI Hazard, where the AI system's use could plausibly lead to harm but no direct or indirect harm has yet materialized. The discussion of mitigation strategies like watermarking further supports the hazard classification rather than an incident or complementary information.
Thumbnail Image

OpenAI CTO dodges questions around training data for text-to-video generator Sora

2024-03-16
The Hindu
Why's our monitor labelling this an incident or hazard?
The article involves an AI system (OpenAI's text-to-video generator Sora) and discusses its development, specifically the training data used. Although no direct harm or incident is reported, the uncertainty and opacity about data sources, combined with ongoing lawsuits against OpenAI for unauthorized use of copyrighted material, indicate a credible risk that the AI system's development could lead to violations of intellectual property rights or other legal harms. Therefore, this situation constitutes an AI Hazard rather than an Incident, as the harm is plausible but not confirmed or realized in this report.
Thumbnail Image

The Best OpenAI Sora TikToks So Far

2024-03-14
Gizmodo
Why's our monitor labelling this an incident or hazard?
The article describes an AI system (Sora) and its capabilities, including concerns about training data and potential job displacement. However, it does not report any realized harm or incidents caused by the AI system. The concerns are about plausible future harms such as intellectual property violations and job impacts, but these are speculative at this stage. Therefore, the event qualifies as an AI Hazard due to the plausible future harms related to training data use and economic impact, but not an AI Incident since no harm has yet occurred.
Thumbnail Image

OpenAI's Sora Could Generate AI Nude Videos This Year

2024-03-13
Gizmodo
Why's our monitor labelling this an incident or hazard?
The article describes an AI system (OpenAI's Sora) that is under development and will soon be released. It explicitly mentions the possibility of generating AI nude videos, which raises credible concerns about potential harms such as privacy violations, non-consensual deepfake pornography, and related social harms. Since no actual harm has yet occurred but there is a plausible risk of significant harm once the system is deployed, this qualifies as an AI Hazard. The article does not describe any realized harm or incident, nor does it focus on responses or governance measures, so it is not an AI Incident or Complementary Information.
Thumbnail Image

Will Sora draw more copyright troubles for OpenAI? - Republic World

2024-03-16
Republic World
Why's our monitor labelling this an incident or hazard?
The article centers on the legal disputes and copyright concerns regarding the data used to train OpenAI's AI systems, which could plausibly lead to harms such as intellectual property rights violations and financial damages if lawsuits succeed. However, no direct or indirect harm from the AI system's use or malfunction is reported as having occurred yet. The discussion is about potential future legal consequences and regulatory challenges, fitting the definition of an AI Hazard rather than an AI Incident or Complementary Information. It is not unrelated because it clearly involves AI systems and their development/use.
Thumbnail Image

OpenAI tech chief admits election misinformation is a concern ahead of Sora launch

2024-03-15
Fortune
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (Sora) capable of generating realistic videos that could be used maliciously to spread misinformation, especially during elections. Although no actual harm has been reported yet, the article highlights the plausible risk of significant harm to communities and democratic processes if the technology is misused. This fits the definition of an AI Hazard, as the development and potential use of Sora could plausibly lead to an AI Incident involving misinformation and election interference. The article does not describe a realized harm or incident but focuses on the potential risks and considerations before release.
Thumbnail Image

In Cringe Video, OpenAI CTO Says She Doesn't Know Where Sora's Training Data Came From

2024-03-15
Futurism
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (Sora, a text-to-video generative AI) and concerns the development phase, specifically the sourcing of training data. The CTO's inability to specify data sources introduces uncertainty about compliance with copyright and privacy laws, which are fundamental rights protections. While no direct harm is reported, the vague data sourcing could plausibly lead to violations of intellectual property rights or privacy breaches, constituting potential future harm. Since no actual harm has materialized or been documented in the article, it does not meet the criteria for an AI Incident. The focus on potential risks and lack of concrete harm aligns with the definition of an AI Hazard.
Thumbnail Image

OpenAI's CTO Won't Discuss Training Data for AI Video Generator Sora

2024-03-14
PetaPixel
Why's our monitor labelling this an incident or hazard?
The article centers on the development and use of an AI system (Sora) and the potential legal and ethical issues related to copyright infringement in its training data. While it strongly implies that copyright violations have occurred or are likely, it does not document a concrete AI Incident such as a legal ruling, a complaint leading to harm, or a direct violation causing harm. Nor does it describe a specific event where the AI system caused injury, rights violations, or other harms. The discussion is primarily about the potential for harm and the ethical concerns around training data use, which aligns with an AI Hazard or Complementary Information. Given that the article mainly provides context, ongoing concerns, and industry practices without reporting a new incident or imminent risk event, it best fits as Complementary Information.
Thumbnail Image

OpenAI will avail Sora to the public within 2024 and it could be just "a few months" - Gizmochina

2024-03-14
Gizmochina
Why's our monitor labelling this an incident or hazard?
The article describes the development and planned public release of an AI system (Sora) capable of generating realistic videos. However, it does not report any actual harm or incidents caused by the AI system at this time. The focus is on the preparation, safety measures, and intended use of the tool. Since the AI system could plausibly lead to harms such as misinformation, deepfakes, or other societal impacts once widely available, this situation fits the definition of an AI Hazard rather than an AI Incident or Complementary Information. It is not unrelated because it clearly involves an AI system with potential for harm.
Thumbnail Image

OpenAI CTO's Evasive Response Raises Questions About Sora's Data Sources

2024-03-15
CCN - Capital & Celeb News
Why's our monitor labelling this an incident or hazard?
The article describes an ongoing investigation and legal scrutiny regarding the training data of an AI system, which could lead to violations of rights or legal obligations if confirmed. However, no actual harm or incident caused by the AI system is reported as having occurred yet. The focus is on potential legal and regulatory consequences and transparency issues, which aligns with the definition of Complementary Information as it provides context and updates on the AI ecosystem and responses to concerns rather than describing a new AI Incident or AI Hazard.
Thumbnail Image

OpenAI says Sora public availability will happen later this year

2024-03-13
Android Headlines
Why's our monitor labelling this an incident or hazard?
The article describes an AI system (Sora) under development and testing, with no reported harm or incidents resulting from its use so far. The mention of red teaming and watermarks indicates proactive safety measures to prevent misuse. There is no indication of direct or indirect harm occurring or imminent. Therefore, this is not an AI Incident or AI Hazard. It is not unrelated because it concerns an AI system and its development. The main focus is on the development status and safety considerations, which fits the definition of Complementary Information.
Thumbnail Image

Will Sora, the new AI from OpenAI, be safe? This is what the company says - Softonic

2024-03-14
Softonic
Why's our monitor labelling this an incident or hazard?
The article does not describe any realized harm or incident caused by Sora or any AI system. Instead, it focuses on the intended use, safety features, and mitigation strategies to prevent potential misuse and harm. Therefore, it fits the category of Complementary Information, as it provides context and updates about the AI system's development and governance without reporting an AI Incident or AI Hazard.
Thumbnail Image

Mira Murati admits election misinformation is a concern ahead of OpenAI's Sora launch

2024-03-15
Yahoo! Finance
Why's our monitor labelling this an incident or hazard?
The article clearly involves an AI system (Sora, a generative AI video tool) and discusses its potential misuse to spread misinformation during elections, which could harm communities and democratic processes. Since no actual misinformation incidents caused by Sora have been reported yet, but the risk is credible and acknowledged by OpenAI, this fits the definition of an AI Hazard. The mention of lawsuits about training data is background context and does not indicate a new incident. Hence, the classification is AI Hazard due to the plausible future harm from misuse of the AI system.