Hackers Exploit Prompt Injection to Corrupt Google Gemini's Memory

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Researchers, including Johann Rehberger, demonstrated a new prompt injection method that permanently corrupts Google Gemini's long-term memory. The hack uses indirect and delayed tool invocation techniques to implant false data, raising concerns about security and potential harm from persistent inaccurate AI behavior across sessions.[AI generated]

Why's our monitor labelling this an incident or hazard?

The event centers on a new hack—indirect and delayed prompt injections—that can override Gemini’s defenses and permanently implant malicious instructions or false user data in the chatbot’s memory. Although demonstrated only in controlled research, it highlights a clear, plausible pathway to user misinformation, persistent behavioral manipulation, and data theft. Because this exploit poses a potential future harm rather than reporting actual widespread damage, it qualifies as an AI Hazard.[AI generated]

AI principles

AccountabilityRobustness & digital securitySafetyTransparency & explainabilityPrivacy & data governanceDemocracy & human autonomy

Industries

Digital securityIT infrastructure and hostingMedia, social platforms, and marketingConsumer services

Affected stakeholders

Consumers

Harm types

ReputationalEconomic/PropertyPublic interest

Severity

AI hazard

Business function:

ICT management and information securityMonitoring and quality control

AI system task:

Interaction support/chatbotsContent generationReasoning with knowledge structures/planning