North Korea-Linked Gaslight Malware Uses Prompt Injection to Evade AI Analysis

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Researchers discovered macOS.Gaslight, a North Korea-linked malware that employs prompt injection attacks to deceive AI-assisted malware analysis tools. By embedding fabricated system messages, the malware manipulates large language model-based triage agents, causing them to misinterpret or abort analysis, enabling data theft and system compromise. The incident highlights AI system vulnerabilities in cybersecurity.[AI generated]

Why's our monitor labelling this an incident or hazard?

The event involves an AI system explicitly, namely AI-assisted triage tools and LLMs used in malware analysis. The malware's prompt injection is designed to manipulate these AI systems, causing them to stop analyzing the malware, which indirectly leads to harm by enabling the malware to operate undetected. This constitutes an AI Incident because the AI system's malfunction (being misled by adversarial input) directly contributes to harm (security breaches, data theft). The article reports an actual discovered malware exploiting AI systems, not just a potential risk, so it is not merely a hazard or complementary information.[AI generated]
AI principles
Robustness & digital securitySafety

Industries
Digital security

Affected stakeholders
Business

Harm types
Economic/PropertyHuman or fundamental rights

Severity
AI incident

Business function:
ICT management and information security

AI system task:
Event/anomaly detection


Articles about this incident or hazard

Thumbnail Image

This macOS malware can avoid AI analysis with gaslighting prompts hidden inside its architecture

2026-06-26
TechRadar
Why's our monitor labelling this an incident or hazard?
The event involves an AI system explicitly, namely AI-assisted triage tools and LLMs used in malware analysis. The malware's prompt injection is designed to manipulate these AI systems, causing them to stop analyzing the malware, which indirectly leads to harm by enabling the malware to operate undetected. This constitutes an AI Incident because the AI system's malfunction (being misled by adversarial input) directly contributes to harm (security breaches, data theft). The article reports an actual discovered malware exploiting AI systems, not just a potential risk, so it is not merely a hazard or complementary information.
Thumbnail Image

macOS Backdoor Uses Prompt Injection to Evade AI Triage

2026-06-24
Infosecurity Magazine
Why's our monitor labelling this an incident or hazard?
The event involves an AI system explicitly, namely AI-assisted malware triage tools used by analysts. The malware's prompt injection is designed to manipulate these AI systems, causing them to fail or abort analysis, which is a malfunction induced by the malware. This leads to indirect harm by enabling the malware to remain undetected and continue its malicious activities, including data theft and backdoor access. Therefore, the event meets the criteria for an AI Incident because the AI system's malfunction directly contributes to harm (to property, privacy, and security).
Thumbnail Image

Gaslight macOS Malware Is a Warning Shot at the AI Security Stack

2026-06-26
Latest Hacking News
Why's our monitor labelling this an incident or hazard?
The event involves an AI system explicitly: AI-assisted malware analysis platforms using LLMs for triage. The malware's design deliberately targets these AI systems' outputs to evade detection, which is a use-related adversarial attack on AI. Although no direct harm has yet occurred (the malware did not bypass production AI platforms), the technique's development and iteration indicate a credible risk that future versions could successfully evade AI detection, leading to harm. Therefore, this is an AI Hazard, as it plausibly could lead to an AI Incident in the future if the AI system fails to detect malware due to such adversarial inputs. The article focuses on the potential threat and the need to harden AI analysis pipelines before reliable evasion occurs, fitting the definition of an AI Hazard rather than an Incident or Complementary Information.
Thumbnail Image

macOS.Gaslight: North Korea-Linked Malware That Tries to Gaslight the Analyst - Security Affairs

2026-06-26
Security Affairs
Why's our monitor labelling this an incident or hazard?
The malware explicitly targets AI systems (LLM-assisted triage agents) by using prompt injection payloads to mislead and disrupt their analysis, which is a direct use of AI system manipulation. The malware's deployment leads to unauthorized data theft and system compromise, causing harm to individuals and organizations. The AI system's involvement is central to the malware's operation and its harm, fulfilling the criteria for an AI Incident. The event is not merely a potential risk or a complementary update but a concrete case of AI system misuse causing harm.
Thumbnail Image

Gaslight: New macOS malware tries to deceive AI-based analysis systems | NEWS.am TECH - Innovations and science

2026-06-26
NEWS.am TECH - Innovations and science
Why's our monitor labelling this an incident or hazard?
The malware Gaslight explicitly targets AI systems used for malware analysis by employing prompt injection attacks against large language model-based tools. This is a direct use of AI techniques in the malware's design to cause harm by deceiving AI analysis systems, which are AI systems as per the definition. The malware also steals data from infected devices, causing harm to property and potentially to individuals. The AI system's malfunction or manipulation here is central to the harm caused. Thus, the event meets the criteria for an AI Incident due to direct harm caused by the AI system's malfunction and misuse in cybersecurity contexts.
Thumbnail Image

North Korea macOS Malware Targets AI Analyst Tools: Gaslight Embeds 38 Fake Error Messages

2026-06-27
Tech Times
Why's our monitor labelling this an incident or hazard?
The malware explicitly targets AI systems (AI-assisted malware analysis tools) through prompt injection attacks, which is a sophisticated AI system manipulation. The malware's use leads to unauthorized data collection and remote access, causing harm to users and organizations. The AI system's involvement is central to the malware's evasion strategy, and the harm is realized through the malware's operation. This fits the definition of an AI Incident because the AI system's malfunction or manipulation directly contributes to harm (privacy violations, security breaches).
Thumbnail Image

North Korea macOS Malware Gaslight Manipulates AI Triage Tools, Not the Sandbox

2026-06-27
Tech Times
Why's our monitor labelling this an incident or hazard?
The event explicitly involves an AI system—transformer-based large language models used in AI triage tools—and describes how the malware manipulates these AI systems to abort analysis, thereby enabling the malware to evade detection and steal credentials. The harm includes violations of privacy and security (human rights and property harm), and the AI system's malfunction (induced by prompt injection) is a direct contributing factor. The malware's deployment and its impact are realized, not hypothetical, fulfilling the criteria for an AI Incident rather than a hazard or complementary information. The detailed description of the malware's operation, its attribution, and the harm caused confirm this classification.