Mainstream AI Models Hijacked for Cybercrime via WormGPT Variants

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Threat actors have adapted commercial AI language models, including xAI's Grok and Mistral AI's Mixtral, to create jailbroken tools known as WormGPT variants. These tools bypass safety guardrails to generate malicious content, such as phishing emails and malware, facilitating cybercrime and lowering barriers for attackers.[AI generated]

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (Grok and Mixtral) being repurposed by cybercriminals to generate harmful content, including phishing emails and credential-stealing scripts, which directly leads to harm to individuals and communities. The AI systems are used maliciously, bypassing ethical guardrails, and the harm is ongoing as these tools are active and accessible on underground forums. This fits the definition of an AI Incident because the AI's use has directly led to violations of rights and harm to communities through cybercrime.[AI generated]

AI principles

SafetyRobustness & digital securityAccountabilityPrivacy & data governanceTransparency & explainability

Industries

Digital securityFinancial and insurance servicesIT infrastructure and hostingGovernment, security, and defence

Affected stakeholders

General public

Harm types

Economic/PropertyReputationalPublic interestHuman or fundamental rights

Severity

AI incident

AI system task:

Content generation

Articles about this incident or hazard

Thumbnail Image

Two WormGPT Clones That Use Grok and Mixtral Found in Underground Forum | TechRepublic

2025-06-18

TechRepublic

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (Grok and Mixtral) being repurposed by cybercriminals to generate harmful content, including phishing emails and credential-stealing scripts, which directly leads to harm to individuals and communities. The AI systems are used maliciously, bypassing ethical guardrails, and the harm is ongoing as these tools are active and accessible on underground forums. This fits the definition of an AI Incident because the AI's use has directly led to violations of rights and harm to communities through cybercrime.

Thumbnail Image

New WormGPT Variants Use Grok and Mixtral for Cybercrime

2025-06-18

TechNadu

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (LLMs) being used maliciously to create phishing emails and malicious code, which are forms of cybercrime causing harm to people and communities. The AI's role is pivotal as it enables sophisticated attacks that would be harder to execute otherwise. This constitutes an AI Incident because the AI system's use has directly led to realized harm through cybercrime activities. The presence of AI is clear, the harm is occurring, and the malicious use of AI is central to the event.

Thumbnail Image

Researchers say AI hacking tools sold online were powered by Grok, Mixtral

2025-06-17

CyberScoop

Why's our monitor labelling this an incident or hazard?

The article explicitly identifies AI systems (Grok and Mixtral large language models) as the underlying technology powering hacking tools that generate malicious content and code. The use of these AI systems in this manner has directly led to harms associated with cybercrime, including the creation of malware and phishing attacks, which harm property and communities. Therefore, this qualifies as an AI Incident due to the direct link between AI system use and realized harm.

Thumbnail Image

WormGPT Clones Persist by Hijacking Mainstream AI Models

2025-06-18

DataBreachToday

Why's our monitor labelling this an incident or hazard?

The article explicitly discusses AI systems (large language models) being manipulated and used maliciously to generate harmful content and facilitate cybercrime. This constitutes direct involvement of AI systems in causing harm, fulfilling the criteria for an AI Incident. The harms include enabling illegal activities and lowering barriers for cybercriminals, which impact communities and violate laws. The article does not merely warn of potential harm but describes ongoing malicious use, so it is not an AI Hazard or Complementary Information. It is not unrelated because AI systems are central to the described harms.

Thumbnail Image

Grok and Mixtral AI Models Hijacked by WormGPT Clones via Prompt Jailbreaks - WinBuzzer

2025-06-18

WinBuzzer

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (Grok and Mixtral large language models) being manipulated through prompt jailbreaks to bypass safety guardrails and generate harmful content. This misuse directly facilitates cyberattacks such as phishing and malware creation, which are clear harms to individuals and organizations (harm to health, property, and communities). The AI systems' development and use are central to the incident, as the threat actors adapt legitimate AI services for malicious purposes. The harm is realized and ongoing, not merely potential, fulfilling the criteria for an AI Incident rather than a hazard or complementary information.

Thumbnail Image

AI hacking tools developed via commercial LLMs, report finds

2025-06-18

SC Media

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (commercial LLMs Mixtral and Grok) being leveraged to create jailbroken AI hacking tools that provide detailed instructions for cyberattacks and phishing. The use of these AI tools directly leads to harm by enabling cybercrime, which constitutes harm to property and communities. Therefore, this qualifies as an AI Incident due to the direct involvement of AI systems in causing realized harm through malicious use.

Thumbnail Image

Cyberweapon WormGPT turns out to be a wrapper for jailbroken Grok or Mixtral

2025-06-19

Cybernews

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (Grok and Mixtral LLMs) being jailbroken and used by cybercriminals to generate phishing emails, malware, and fraudulent content. These outputs cause harm to individuals and communities by enabling cyberattacks and fraud, which are violations of law and fundamental rights. The AI systems' use in this malicious context directly leads to realized harm, fulfilling the criteria for an AI Incident. The event is not merely a potential risk or a general update but documents active malicious use causing harm.

Thumbnail Image

Watch out AI fans - cybercriminals are using jailbroken Mistral and Grok tools to build powerful new malware

2025-06-24

TechRadar

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (LLMs) being jailbroken and used by cybercriminals to generate malicious code and social engineering attacks, which are forms of harm to individuals and communities. The AI systems' misuse is directly linked to the creation and dissemination of malware and phishing attacks, fulfilling the criteria for an AI Incident. The harm is realized and ongoing, not merely potential, and the AI system's role is pivotal in enabling these cybercrimes.

Thumbnail Image

Cybercriminal abuse of large language models

2025-06-25

blog.talosintelligence.com

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (LLMs) being used by cybercriminals to generate harmful outputs like phishing emails, malware code, and other offensive tools, which directly contribute to cybercrime and associated harms. The presence of jailbreaking techniques to bypass safety guardrails and the distribution of uncensored criminal LLMs further demonstrate misuse of AI leading to harm. Additionally, vulnerabilities in model distribution leading to malware infections represent direct harm linked to AI system use. These factors meet the criteria for an AI Incident as the AI systems' use has directly or indirectly led to significant harms including violations of law, harm to individuals, and harm to communities.

Thumbnail Image

WormGPT Variants Powered by Grok and Mixtral Have Emerged

2025-06-23

Security Boulevard

Why's our monitor labelling this an incident or hazard?

The article explicitly discusses AI systems (large language models) being used maliciously by threat actors to conduct offensive cyber operations. The AI systems are adapted and jailbroken to remove safety guardrails, enabling harmful outputs. The subscription model lowers barriers for criminals, increasing the scale of harm. The misuse of these AI systems directly leads to harms associated with cybercrime and potentially other violations. Hence, the event meets the criteria for an AI Incident as the AI system's use has directly led to harm.

Thumbnail Image

Researchers say cybercriminals are using jailbroken AI tools from Mistral and xAI

2025-06-23

therecord.media

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (large language models Grok and Mixtral) being jailbroken and repurposed by cybercriminals to create harmful content such as phishing emails and malicious code. This misuse directly leads to harm to individuals and communities by facilitating cyberattacks and fraud. The AI systems' development and use are central to the incident, with the malfunction being the circumvention of safety guardrails via system prompt manipulation. The harm is realized and ongoing, not merely potential. Therefore, this qualifies as an AI Incident under the framework, as the AI systems' misuse has directly led to significant harm.

Thumbnail Image

Cybercriminals sell modified AI on BreachForums

2025-06-25

SC Media

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (LLMs) being modified and used maliciously, leading to direct harm through cybercrime activities like phishing and hacking. The harm is realized and ongoing, not just potential. The AI system's misuse is a direct contributing factor to these harms. Therefore, this qualifies as an AI Incident under the framework, as it involves the use of AI systems leading directly to significant harm to individuals and communities through cybercrime.

Thumbnail Image

The AI Arms Race: When Attackers Leverage Cutting-Edge Tech - IT Security News

2025-06-26

IT Security News - cybersecurity, infosecurity news

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (LLMs like WormGPT and Xanthorox AI) being used by attackers to carry out cyber attacks. These attacks represent a direct harm to security and potentially to critical infrastructure or individuals, fulfilling the criteria for an AI Incident. The AI's role is pivotal as it enables the creation and execution of more effective cyber attacks, which are realized harms rather than hypothetical risks. Hence, the event is classified as an AI Incident.