Google Gemini for Workspace Exposed to Indirect Prompt Injection Attacks

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Researchers found Google's Gemini for Workspace AI assistant vulnerable to indirect prompt injection, allowing attackers to embed malicious instructions in emails or documents. This can manipulate AI outputs, potentially leading to phishing attacks or misleading messages. Google was notified but dismissed the issue as intended behavior, leaving users at risk.[AI generated]

Why's our monitor labelling this an incident or hazard?

The event involves an AI system (Google Gemini for Workspace) that is susceptible to indirect prompt injection attacks, which manipulate the AI's outputs in ways that could mislead users or facilitate phishing attacks. This constitutes a direct or indirect cause of harm to users' security and privacy, fitting the definition of an AI Incident due to violations of user rights and potential harm to individuals. The fact that the vulnerability has been exploited in proofs-of-concept and reported to Google, with no remediation, confirms the realized risk rather than a mere potential hazard.[AI generated]
AI principles
AccountabilityRobustness & digital securitySafetyTransparency & explainabilityPrivacy & data governanceRespect of human rightsDemocracy & human autonomyHuman wellbeing

Industries
Digital securityIT infrastructure and hostingBusiness processes and support servicesMedia, social platforms, and marketing

Affected stakeholders
WorkersBusiness

Harm types
Economic/PropertyReputationalPsychologicalHuman or fundamental rights

Severity
AI incident

Business function:
ICT management and information securityCitizen/customer serviceMarketing and advertisement

AI system task:
Interaction support/chatbotsContent generation


Articles about this incident or hazard

Thumbnail Image

Google's Gmail Update Decision -- 'Significant Risk' Warning For Millions Of Users

2024-09-28
Forbes
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (Google's Gemini large language model integrated into Gmail and Workspace) and discusses a specific vulnerability (indirect prompt injection) that could be exploited by attackers to cause harm such as phishing attacks and data leaks. While no actual harm is reported as having occurred, the described vulnerability and proof-of-concept attacks demonstrate a credible risk of harm to millions of users. This fits the definition of an AI Hazard, as the AI system's use could plausibly lead to an AI Incident involving harm to users. The article also mentions Google's ongoing efforts to mitigate these risks, but the threat remains significant. Since no realized harm is described, it is not an AI Incident. The focus is on the potential risk rather than a response or update to a past incident, so it is not Complementary Information. It is clearly related to AI and its risks, so it is not Unrelated.
Thumbnail Image

Google Gemini for Workspace Vulnerable to Indirect Prompt Injection

2024-09-30
ChannelE2E
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (Google Gemini for Workspace) that is susceptible to indirect prompt injection attacks, which manipulate the AI's outputs in ways that could mislead users or facilitate phishing attacks. This constitutes a direct or indirect cause of harm to users' security and privacy, fitting the definition of an AI Incident due to violations of user rights and potential harm to individuals. The fact that the vulnerability has been exploited in proofs-of-concept and reported to Google, with no remediation, confirms the realized risk rather than a mere potential hazard.
Thumbnail Image

Gemini for Workspace susceptible to indirect prompt injection, researchers say

2024-09-27
SC Media
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (Gemini for Workspace, a large language model integrated into Google Workspace) and describes how its use is vulnerable to indirect prompt injection attacks. These attacks have been demonstrated to cause the AI to produce misleading or harmful outputs, such as phishing messages that could lead to credential theft, which is a violation of user security and privacy (harm to persons). The harm is realized or at least clearly demonstrated through proof-of-concept attacks. Hence, this is an AI Incident as the AI system's use has directly or indirectly led to harm.