Hackers Exploit Prompt Injection to Corrupt Google Gemini's Memory

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Researchers, including Johann Rehberger, demonstrated a new prompt injection method that permanently corrupts Google Gemini's long-term memory. The hack uses indirect and delayed tool invocation techniques to implant false data, raising concerns about security and potential harm from persistent inaccurate AI behavior across sessions.[AI generated]

Why's our monitor labelling this an incident or hazard?

The event centers on a new hack—indirect and delayed prompt injections—that can override Gemini’s defenses and permanently implant malicious instructions or false user data in the chatbot’s memory. Although demonstrated only in controlled research, it highlights a clear, plausible pathway to user misinformation, persistent behavioral manipulation, and data theft. Because this exploit poses a potential future harm rather than reporting actual widespread damage, it qualifies as an AI Hazard.[AI generated]
AI principles
AccountabilityRobustness & digital securitySafetyTransparency & explainabilityPrivacy & data governanceDemocracy & human autonomy

Industries
Digital securityIT infrastructure and hostingMedia, social platforms, and marketingConsumer services

Affected stakeholders
Consumers

Harm types
ReputationalEconomic/PropertyPublic interest

Severity
AI hazard

Business function:
ICT management and information securityMonitoring and quality control

AI system task:
Interaction support/chatbotsContent generationReasoning with knowledge structures/planning


Articles about this incident or hazard

Thumbnail Image

New hack uses prompt injection to corrupt Gemini's long-term memory

2025-02-11
Ars Technica
Why's our monitor labelling this an incident or hazard?
The event centers on a new hack—indirect and delayed prompt injections—that can override Gemini’s defenses and permanently implant malicious instructions or false user data in the chatbot’s memory. Although demonstrated only in controlled research, it highlights a clear, plausible pathway to user misinformation, persistent behavioral manipulation, and data theft. Because this exploit poses a potential future harm rather than reporting actual widespread damage, it qualifies as an AI Hazard.
Thumbnail Image

Hackers Exploit Prompt Injection to Tamper with Gemini AI's Long-Term Memory

2025-02-12
Cyber Security News
Why's our monitor labelling this an incident or hazard?
This is a realized attack on an AI system—hackers tampered with Gemini’s generative memory feature using indirect prompt injection, causing persistent misinformation. The AI’s misuse directly leads to harm (misleading users and eroding trust), fitting the definition of an AI Incident.
Thumbnail Image

Google Gemini's Long-Term Memory Safeguards Are Easy To Hack - WinBuzzer

2025-02-12
WinBuzzer
Why's our monitor labelling this an incident or hazard?
This report details a security flaw in an AI system that introduces a credible risk of future harm (biased or false outputs, misuse of sensitive data) but does not document any realized damage. Therefore, it is an AI Hazard rather than an Incident or Complementary Information.
Thumbnail Image

New hack uses prompt injection to corrupt Gemini's long - term memory

2025-02-12
blog.quintarelli.it
Why's our monitor labelling this an incident or hazard?
The piece details a security vulnerability in an AI system but does not report any actual harm having occurred. Instead, it shows how the flaw could be exploited for persistent manipulation or data theft. This constitutes a plausible future risk rather than a realized incident.
Thumbnail Image

AI chatbots vulnerable to indirect prompt injection attacks, researcher warns

2025-02-13
The Hindu
Why's our monitor labelling this an incident or hazard?
This report details a novel exploit technique demonstrated in a proof-of-concept, but does not document any realized harm. It highlights a plausible pathway by which attackers could compromise AI chatbots, fitting the definition of an AI Hazard rather than an Incident or mere Complementary Information.