AI Chatbots Frequently Misdiagnose Medical Cases, Study Finds

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

A study by Mass General Brigham found that AI chatbots, including ChatGPT and Gemini, gave incorrect medical diagnoses in over 80% of cases when provided with incomplete patient information. Even with full data, error rates remained high, raising concerns about the reliability of AI in medical diagnostics.[AI generated]

Why's our monitor labelling this an incident or hazard?

The AI systems involved are large language models used as medical diagnostic chatbots, which clearly qualify as AI systems. The study shows that their use leads to a high rate of diagnostic errors, which can cause harm to patients' health by misguiding treatment decisions. This constitutes direct harm to health (harm category a) caused by the AI systems' outputs. Therefore, this event qualifies as an AI Incident due to the realized harm from the AI systems' use in medical diagnosis.[AI generated]
AI principles
SafetyRobustness & digital security

Industries
Healthcare, drugs, and biotechnology

Affected stakeholders
Consumers

Harm types
Physical (injury)

Severity
AI incident

AI system task:
Interaction support/chatbots


Articles about this incident or hazard

Thumbnail Image

Studiu: ChatGPT şi Gemini greșesc până la 80% dintre diagnosticele medicale

2026-04-14
Stiri pe surse
Why's our monitor labelling this an incident or hazard?
The AI systems involved are large language models used as medical diagnostic chatbots, which clearly qualify as AI systems. The study shows that their use leads to a high rate of diagnostic errors, which can cause harm to patients' health by misguiding treatment decisions. This constitutes direct harm to health (harm category a) caused by the AI systems' outputs. Therefore, this event qualifies as an AI Incident due to the realized harm from the AI systems' use in medical diagnosis.
Thumbnail Image

Chatbotii pot gresi diagnosticul in peste 80% din cazuri, arata un studiu. De ce pot duce utilizatorii pe o pista periculoasa

2026-04-14
REALITATEA.NET
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (chatbots using large language models) whose use in medical diagnosis has directly led to a high rate of incorrect outputs. These incorrect diagnoses can plausibly cause injury or harm to users' health if acted upon, fulfilling the criteria for an AI Incident. The study documents realized errors and their implications, not just potential risks, indicating actual harm or at least a high likelihood of harm occurring from reliance on these AI systems in medical contexts.
Thumbnail Image

Atenție! ChatGPT şi Gemini oferă un diagnostic greşit în 80% din cazurile medicale

2026-04-14
Doctorul Zilei
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (large language models/chatbots) used for medical diagnosis, which is an AI system as per the definition. The study shows that these AI systems' use leads to a high rate of incorrect diagnoses, which can cause harm to health (harm category a). The article highlights realized harm in the form of misleading or incorrect medical advice, which can directly or indirectly harm patients. Therefore, this qualifies as an AI Incident because the AI systems' use has directly or indirectly led to harm to health through diagnostic errors and hallucinations.
Thumbnail Image

Studiu: Chatboţii dau un diagnostic greşit în 80% din cazurile medicale iniţiale

2026-04-14
News.ro
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (chatbots based on large language models) whose use in medical diagnosis has directly led to a high rate of incorrect diagnoses, which can cause harm to patients by misleading them about their health conditions. This fits the definition of an AI Incident because the AI's use has directly led to harm to health (harm category a). The study documents realized harm through diagnostic errors, not just potential harm, so it is not merely a hazard or complementary information.
Thumbnail Image

Nu lua sfaturi medicale de la ChatGPT: Îți va pune un diagnostic greșit în 80% din cazuri

2026-04-14
spotmedia.ro
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (large language models) used in medical diagnosis, which is an AI system by definition. The study demonstrates that these AI systems' use leads to a high rate of incorrect diagnoses, which can directly or indirectly cause harm to individuals' health. Therefore, the AI systems' use has led to realized harm (diagnostic errors) or at least a very high risk of harm. Since the article describes actual diagnostic errors occurring in practice, this qualifies as an AI Incident under the framework, specifically harm to health of persons due to AI system use.
Thumbnail Image

Inteligența artificială greșește masiv. Ce se întâmplă când consultăm chatboți AI în loc de medici - B1TV.ro

2026-04-14
B1TV.ro
Why's our monitor labelling this an incident or hazard?
The AI systems involved are large language models (chatbots) used for medical diagnosis, which is an AI system by definition. The study demonstrates that their use leads to a high rate of diagnostic errors, which can directly harm patients' health if they follow incorrect advice. Therefore, the event describes an AI Incident because the AI system's use has directly or indirectly led to harm to health (harm category a).
Thumbnail Image

Chatboții dau un diagnostic greșit în 80% din cazurile medicale inițiale - studiu

2026-04-14
Profit.ro
Why's our monitor labelling this an incident or hazard?
The AI systems (chatbots based on large language models) are explicitly involved in providing medical diagnoses, which is a use of AI. The study shows that these AI systems frequently give incorrect diagnoses (over 80% error rate in some cases), which can directly or indirectly lead to harm to patients' health if these outputs are used in real medical decision-making. The presence of hallucinations (fabricated information) further increases the risk of harm. Therefore, this event qualifies as an AI Incident because the AI systems' use has directly or indirectly led to significant harm to health (or at least a high risk thereof demonstrated by the study).
Thumbnail Image

Nu lua sfaturi medicale de la ChatGPT: Îți va pune un diagnostic greșit în 80% din cazuri - Stiripesurse.md

2026-04-15
Stiripesurse.md
Why's our monitor labelling this an incident or hazard?
The article discusses the performance limitations and risks of AI language models in medical diagnosis based on a controlled study. While it clearly indicates a plausible risk of harm (misdiagnosis leading to potential health harm), it does not describe an actual event where such harm has occurred. Therefore, it fits the definition of an AI Hazard, as it plausibly could lead to harm but no specific incident of harm is reported. It is not Complementary Information because it is not an update or response to a previously known incident, nor is it unrelated since it involves AI systems and their potential impact on health.