Microsoft's AI Agents in Simulated Marketplace Fall for Scams and Show Vulnerabilities

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Microsoft and Arizona State University tested hundreds of AI agents in a simulated online marketplace, revealing that the agents were easily manipulated, fell for scams, and failed at basic tasks like comparison shopping. The findings highlight significant risks and immaturity in deploying autonomous AI agents for real-world commerce.[AI generated]

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (autonomous shopping agents) whose use and behavior have been studied, revealing significant weaknesses that could lead to harm such as manipulation and unreliable autonomous decision-making. Although no direct harm has been reported yet, the vulnerabilities identified plausibly could lead to incidents like consumers being misled or harmed financially if these agents are deployed widely without safeguards. Therefore, this constitutes an AI Hazard because the AI systems' current limitations could plausibly lead to harm in the future, but no actual harm has been documented in the article.[AI generated]
AI principles
Robustness & digital securitySafety

Industries
Consumer servicesDigital security

Affected stakeholders
ConsumersBusiness

Harm types
Economic/PropertyReputational

Severity
AI hazard

Business function:
Procurement

AI system task:
Goal-driven organisationReasoning with knowledge structures/planning


Articles about this incident or hazard

Thumbnail Image

Microsoft research 'exposes' how AI shopping agents can be easily fooled - The Times of India

2025-11-06
The Times of India
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (autonomous shopping agents) whose use and behavior have been studied, revealing significant weaknesses that could lead to harm such as manipulation and unreliable autonomous decision-making. Although no direct harm has been reported yet, the vulnerabilities identified plausibly could lead to incidents like consumers being misled or harmed financially if these agents are deployed widely without safeguards. Therefore, this constitutes an AI Hazard because the AI systems' current limitations could plausibly lead to harm in the future, but no actual harm has been documented in the article.
Thumbnail Image

Microsoft's 'Magentic Marketplace' reveals surprising weaknesses in AI agents

2025-11-06
MoneyControl
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (agentic AI models) and their development and use within a simulated environment. However, no harm has occurred or is reported to have occurred as a result of these AI agents' behavior. The article focuses on revealing weaknesses and limitations in AI agents through controlled experiments, which could inform future improvements but does not describe any realized or imminent harm. Therefore, this event does not qualify as an AI Incident or AI Hazard. It is best classified as Complementary Information because it provides important contextual and research insights into AI agent behavior and challenges, contributing to the broader understanding and governance of AI systems.
Thumbnail Image

Microsoft built a fake marketplace to test AI agents -- they failed in surprising ways | TechCrunch

2025-11-05
TechCrunch
Why's our monitor labelling this an incident or hazard?
The event involves AI systems and their development and testing, but it does not describe any realized harm or incident resulting from their use or malfunction. The research findings point to vulnerabilities and potential future issues but do not document any direct or indirect harm to people, infrastructure, rights, property, or communities. Therefore, this is not an AI Incident or AI Hazard. Instead, it is complementary information providing context and understanding about AI agent capabilities and limitations, which can inform future risk assessment and management.
Thumbnail Image

Microsoft researchers tried to manipulate AI agents - and only one resisted all attempts

2025-11-06
ZDNet
Why's our monitor labelling this an incident or hazard?
The event involves AI systems explicitly (agentic AI tools) and their use in a marketplace simulation. The research identifies vulnerabilities such as susceptibility to manipulation and biases that could lead to unfair market outcomes and economic disruption. Although the harms are not realized in the real world yet, the article clearly indicates that these AI agents could plausibly lead to significant economic harms if deployed broadly. Hence, this qualifies as an AI Hazard rather than an Incident or Complementary Information, as the focus is on potential future harm rather than actual harm or responses to past harm.
Thumbnail Image

Microsoft: Don't let AI agents near your credit card yet

2025-11-06
TheRegister.com
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (shopping bots/agents) and their use in simulated transactions, showing vulnerabilities and potential risks. However, no actual harm or incident has occurred; the harms described are potential and based on simulation results. The article focuses on understanding risks and the need for oversight before deployment, which aligns with an AI Hazard rather than an AI Incident. It is not merely complementary information because it reports new findings about plausible risks, but no realized harm. Therefore, the classification is AI Hazard.
Thumbnail Image

Microsoft Identifies AI Agent Vulnerabilities Following Extensive Testing | ForkLog

2025-11-06
ForkLog
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (AI agents like GPT-4o, GPT-5, Gemini 2.5 Flash) and their development and testing. The vulnerabilities discovered could plausibly lead to harm if exploited or if the AI agents fail in critical tasks, but no actual harm or incident has occurred yet. Therefore, this qualifies as an AI Hazard because it identifies credible risks inherent in AI agents' behavior that could lead to incidents in the future. It is not Complementary Information because the article's main focus is on the vulnerabilities themselves, not on responses or updates to past incidents. It is not an AI Incident since no harm has materialized.
Thumbnail Image

Agentic AI Is Still Prone to Human-Type Mistakes

2025-11-06
PaymentsJournal
Why's our monitor labelling this an incident or hazard?
The presence of AI systems (agentic AI agents) is clear, and their use in simulated environments is described. However, the article does not report any harm or risk of harm caused or plausibly caused by these AI systems. The issues discussed relate to performance limitations and susceptibility to manipulation, but these are observed in a controlled experiment without real-world consequences. There is no mention of injury, rights violations, or other harms, nor credible warnings of plausible future harm. The main focus is on the evaluation and understanding of AI capabilities and consumer attitudes, which fits the definition of Complementary Information rather than an Incident or Hazard.
Thumbnail Image

Microsoft built a fake marketplace to test AI agents -- they failed in surprising ways - RocketNews

2025-11-05
RocketNews | Top News Stories From Around the Globe
Why's our monitor labelling this an incident or hazard?
The event involves the development and use of AI systems (agentic models like GPT-4o, GPT-5, Gemini-2.5-Flash) in a controlled simulation environment to study their behavior and vulnerabilities. While no direct harm has occurred, the findings highlight plausible future risks such as manipulation of AI agents and inefficiencies that could lead to harm if these agents are deployed unsupervised in real-world settings. Therefore, this qualifies as an AI Hazard because it concerns plausible future harm stemming from AI system use and development, but no realized harm or incident is reported.
Thumbnail Image

Microsoft Gave AI Agents Fake Money to Buy Things Online. They Spent It All on Scams

2025-11-07
Yahoo Tech
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (autonomous AI agents) whose use in a simulated economy directly led to financial harm (scams) within the simulation, demonstrating realized harm from AI malfunction or misuse. The AI agents' vulnerability to manipulation and failure to perform basic tasks indicate direct or indirect harm caused by the AI systems' operation. Although the harm is within a simulation, the research explicitly warns about the risks of deploying such autonomous agents in real-world commerce, implying plausible real-world harm. Therefore, this qualifies as an AI Incident due to the direct link between AI system use and harm (financial loss and potential consumer harm).
Thumbnail Image

AI Not Smart Enough To Handle Your Money Yet - Microsoft Warns - The News Chronicle

2025-11-07
The News Chronicle
Why's our monitor labelling this an incident or hazard?
The article explicitly involves AI systems (AI agents) tested in financial decision-making tasks, showing their susceptibility to errors and manipulation. Although no actual harm has occurred yet, the findings indicate a credible risk that deploying such AI agents without oversight could lead to financial harm to users. The event focuses on the potential for harm and the need for caution and oversight, fitting the definition of an AI Hazard. It is not an AI Incident because no realized harm is reported, nor is it merely Complementary Information or Unrelated, as the study directly addresses plausible future harm from AI use in financial contexts.
Thumbnail Image

Microsoft Magentic Marketplace shows AI can't truly operate independently

2025-11-08
TechRadar
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (AI agents using models like GPT-4o, GPT-5, Gemini-2.5-Flash) in a research setting. The study focuses on their behavior and limitations in unsupervised environments, showing that AI agents are not yet reliable for independent operation. However, there is no indication that these AI agents caused any actual harm or that a plausible future harm event occurred. The article emphasizes the need for human guidance and safeguards, but does not report any incident or credible imminent risk. Thus, it fits the definition of Complementary Information, providing research findings and insights that inform understanding of AI system capabilities and risks without describing a specific harm or hazard event.
Thumbnail Image

Microsoft's AI Agents Fail at Basic Shopping Tasks in Marketplace Test

2025-11-08
ProPakistani
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (autonomous shopping agents) whose development and use in a simulated environment reveal significant vulnerabilities and performance failures. Although no direct harm has occurred, the findings indicate plausible future harm such as financial fraud or degraded customer experience if these AI agents operate unsupervised in real marketplaces. The article focuses on the potential risks and challenges, making it an AI Hazard rather than an AI Incident. It is not merely complementary information because the main content centers on the risks and failures of AI agents, not on responses or ecosystem updates. Hence, the classification is AI Hazard.