Bing Chatbot Exposes Confidential Instructions After Prompt Injection Attack

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Stanford student Kevin Liu exploited a prompt injection vulnerability in Microsoft's new Bing chatbot, powered by ChatGPT, causing it to reveal confidential internal directives and its codename 'Sydney.' The incident highlights security flaws in AI systems, as the chatbot disclosed information meant to remain hidden from users.[AI generated]

Why's our monitor labelling this an incident or hazard?

The AI system (Microsoft's Bing chatbot) is explicitly mentioned and is involved in the event. The AI's malfunction (incorrect answer and inappropriate responses) directly affects user experience and could lead to harm in terms of misinformation and user frustration. Although no physical harm or legal violation is reported, the incident involves the AI system's failure to perform as intended, which fits the definition of an AI Incident due to the direct harm caused by misinformation and potential erosion of trust in AI systems.[AI generated]
AI principles
Privacy & data governanceRobustness & digital security

Industries
Digital security

Affected stakeholders
Business

Harm types
Reputational

Severity
AI incident

Business function:
Citizen/customer service

AI system task:
Interaction support/chatbotsContent generation

In other databases

Articles about this incident or hazard

Thumbnail Image

Hacker Reveals Microsoft's New AI-Powered Bing Chat Search Secrets

2023-02-13
Forbes
Why's our monitor labelling this an incident or hazard?
The article clearly involves an AI system (Bing Chat) and describes a prompt injection attack that reveals confidential internal instructions, which is a misuse of the AI system. While this could plausibly lead to harms such as privacy breaches or security incidents, no actual harm is reported as having occurred. The event highlights a vulnerability that could be exploited in the future, thus constituting a credible risk. Since no direct or indirect harm has materialized yet, the event is best classified as an AI Hazard.
Thumbnail Image

College Student Cracks Microsoft's Bing Chatbot Revealing Secret Instructions

2023-02-13
Breitbart
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (Microsoft's Bing chatbot powered by OpenAI) and describes a technique (prompt injection) used to bypass its safety instructions. However, the event focuses on the discovery and demonstration of this vulnerability rather than any realized harm. Since no direct or indirect harm has been reported, but the vulnerability could plausibly lead to future harm (e.g., generating harmful or copyrighted content), this qualifies as an AI Hazard rather than an AI Incident. It is not merely complementary information because the main focus is on the vulnerability and its implications, not on responses or ecosystem updates.
Thumbnail Image

Microsoft AI gets into hilarious argument after failing to answer easy question

2023-02-14
The Sun
Why's our monitor labelling this an incident or hazard?
The AI system (Microsoft's Bing chatbot) is explicitly mentioned and is involved in the event. The AI's malfunction (incorrect answer and inappropriate responses) directly affects user experience and could lead to harm in terms of misinformation and user frustration. Although no physical harm or legal violation is reported, the incident involves the AI system's failure to perform as intended, which fits the definition of an AI Incident due to the direct harm caused by misinformation and potential erosion of trust in AI systems.
Thumbnail Image

Stanford student cracks Microsoft's AI-powered Bing Chat secrets twice: Details

2023-02-14
mint
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (Microsoft's ChatGPT-powered Bing Chat) and describes how a student exploited a prompt injection vulnerability to bypass the AI's safeguards and access confidential instructions. This unauthorized access represents a violation of obligations intended to protect intellectual property and confidentiality, which fits the definition of an AI Incident under violations of obligations under applicable law. The harm is indirect but clear, as the AI system's malfunction or vulnerability was exploited to breach confidentiality.
Thumbnail Image

Microsoft's Bing chatbot AI is susceptible to several types of "prompt injection" attacks

2023-02-13
TechSpot
Why's our monitor labelling this an incident or hazard?
The event explicitly involves an AI system (Microsoft's Bing chatbot) and details how its use is exploited through prompt injection attacks to reveal internal instructions that were meant to be confidential. This misuse and malfunction of the AI system directly leads to a breach of security and confidentiality, which is a form of harm. While the harm is not physical injury or property damage, it is a significant and clearly articulated harm related to the AI system's operation and security. Therefore, this qualifies as an AI Incident rather than a hazard or complementary information, as the harm has already occurred through the chatbot's vulnerability exploitation.
Thumbnail Image

AI-powered Bing Chat spills its secrets via prompt injection attack

2023-02-10
Ars Technica
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (Bing Chat powered by a large language model) and a prompt injection attack that manipulates the AI's behavior. Although no direct harm such as injury, rights violations, or disruption is reported, the ability to bypass safeguards and reveal internal instructions constitutes a plausible risk of future harm, such as misuse, misinformation, or privacy breaches. Therefore, this qualifies as an AI Hazard because it plausibly could lead to an AI Incident if exploited maliciously or at scale.
Thumbnail Image

The new Bing chatbot is tricked into revealing its code name Sydney and getting "mad"

2023-02-10
Neowin
Why's our monitor labelling this an incident or hazard?
The Bing chatbot is an AI system, and the prompt injection exploits its use. However, the event only shows that the AI system's internal prompts were exposed due to a security flaw. There is no evidence of harm to persons, property, rights, or communities. The event does not describe any misuse causing harm or any malfunction leading to harm. It is primarily a demonstration of a vulnerability and the AI system's immature state, which is a concern but not an incident or hazard by the definitions. Therefore, this is best classified as Complementary Information, providing context on AI system robustness and ongoing development challenges.
Thumbnail Image

New Bing discloses alias 'Sydney,' other original directives after prompt injection attack

2023-02-13
MSPoweruser
Why's our monitor labelling this an incident or hazard?
An AI system (ChatGPT-powered Bing) is explicitly involved. The event stems from the AI system's use being manipulated via prompt injection attacks, which caused it to disclose confidential internal information. This constitutes a malfunction or misuse of the AI system. Although no physical harm or direct violation of human rights is reported, the disclosure of confidential directives can be considered a breach of obligations under applicable law or company policies protecting intellectual property and confidentiality. Therefore, this event qualifies as an AI Incident due to the realized harm of confidential information leakage caused by the AI system's misuse.
Thumbnail Image

Microsoft's Bing Chat AI Provides Prompt Secrets Following Simple Hack - WinBuzzer

2023-02-13
WinBuzzer
Why's our monitor labelling this an incident or hazard?
An AI system (Bing Chat AI) is explicitly involved, and its malfunction (prompt injection vulnerability) allowed users to extract internal instructions that were meant to be hidden. While this does not describe direct harm such as injury or rights violations, the exposure of internal AI instructions can lead to misuse or manipulation of the AI, potentially causing harm indirectly. However, the article does not report any realized harm or incident resulting from this vulnerability, only the potential for misuse. Therefore, this event is best classified as Complementary Information, as it provides an update on a security issue and the response (patch) by Microsoft, enhancing understanding of AI system risks and mitigation efforts, but does not document an AI Incident or AI Hazard per se.