ChatGPT misdiagnosed 8 in 10 pediatric cases, study warns

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Researchers at Cohen Children’s Medical Center evaluated ChatGPT (v3.5 and v4) on 100 pediatric cases from JAMA Pediatrics and NEJM. The AI misdiagnosed 83% of cases—72 outright errors and 11 overly broad—highlighting serious patient safety risks and prompting calls for more targeted training before clinical use.[AI generated]

Why's our monitor labelling this an incident or hazard?

The event involves an AI system (ChatGPT) used for medical diagnosis, a task clearly involving AI systems as defined. The study found that the AI system produced incorrect or incomplete diagnoses in over 80% of pediatric cases tested, which could lead to injury or harm to patients if these diagnoses were used in clinical decision-making. This constitutes harm to health (a), even if the article does not report actual patient harm, the demonstrated diagnostic errors imply a direct risk of harm. The AI system's use and its diagnostic errors are central to the event, fulfilling the criteria for an AI Incident rather than a hazard or complementary information. The study's recommendation for further research and caution underscores the significance of the harm potential.[AI generated]
AI principles
SafetyRobustness & digital securityTransparency & explainabilityAccountabilityHuman wellbeing

Industries
Healthcare, drugs, and biotechnology

Affected stakeholders
Children

Harm types
Physical (injury)PsychologicalReputational

Severity
AI incident

Business function:
Research and development

AI system task:
Interaction support/chatbotsReasoning with knowledge structures/planningContent generation


Articles about this incident or hazard

Thumbnail Image

Don't Call Doctor ChatGPT: AI Fails Pediatric Diagnosis Test Miserably with 17% Accuracy Rating

2024-01-05
Breitbart
Why's our monitor labelling this an incident or hazard?
ChatGPT-4 is an AI system used for generating diagnostic suggestions based on input data. The article explicitly discusses its use in pediatric diagnosis and its poor performance, which could plausibly lead to harm (injury or harm to health) if the AI's outputs were used in real medical decision-making. Although no actual harm is reported, the AI's failure in diagnosis represents a credible risk of harm in clinical practice. Therefore, this event qualifies as an AI Hazard rather than an AI Incident, as the harm is potential and plausible but not yet realized.
Thumbnail Image

ChatGPT incorrectly diagnosed more than 8 in 10 pediatric case studies, research finds

2024-01-03
The Hill
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (ChatGPT) used for medical diagnosis, a task clearly involving AI systems as defined. The study found that the AI system produced incorrect or incomplete diagnoses in over 80% of pediatric cases tested, which could lead to injury or harm to patients if these diagnoses were used in clinical decision-making. This constitutes harm to health (a), even if the article does not report actual patient harm, the demonstrated diagnostic errors imply a direct risk of harm. The AI system's use and its diagnostic errors are central to the event, fulfilling the criteria for an AI Incident rather than a hazard or complementary information. The study's recommendation for further research and caution underscores the significance of the harm potential.
Thumbnail Image

ChatGPT fails at diagnosing child medical cases. It's wrong 83 percent of the time.

2024-01-04
Mashable
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (ChatGPT, a large language model) used for medical diagnosis. The study shows the AI's diagnostic errors, which could plausibly lead to harm (misdiagnosis, incorrect treatment) if the system were used clinically without adequate safeguards. No actual harm or incidents are reported, so it is not an AI Incident. The article focuses on the potential risks and limitations, fitting the definition of an AI Hazard, as the AI's use could plausibly lead to harm in the future.
Thumbnail Image

ChatGPT Fails At Diagnosing Child Medical Cases. It's Wrong 83 Percent Of The Time.

2024-01-05
Mashable India
Why's our monitor labelling this an incident or hazard?
The article explicitly discusses the use of an AI system (ChatGPT) in diagnosing pediatric medical cases and documents a high rate of incorrect diagnoses. This is a direct use of the AI system leading to harm to health, as misdiagnosis can cause injury or harm to patients. The study's findings demonstrate that the AI system's outputs are unreliable for clinical diagnosis, which is a clear example of harm caused by the AI system's use. Hence, the event meets the criteria for an AI Incident.
Thumbnail Image

Don't use ChatGPT to diagnose your kid's illness -- study finds 83% error rate

2024-01-03
Ars Technica
Why's our monitor labelling this an incident or hazard?
The event involves the use of an AI system (ChatGPT-4) in diagnosing pediatric medical cases, which is a clear AI system involvement. The study assesses the AI's diagnostic errors, which could plausibly lead to harm if used in real clinical settings without proper oversight. However, the article does not report any actual incidents of harm or injury caused by the AI's diagnostic errors. Therefore, this situation represents a potential risk or hazard rather than a realized incident. The article primarily provides research findings and suggestions for improvement, without describing any direct or indirect harm caused by the AI system. Hence, it fits the definition of an AI Hazard.
Thumbnail Image

Diagnostic Accuracy of a Large Language Model in Pediatric Case Studies

2024-01-05
jamanetwork.com
Why's our monitor labelling this an incident or hazard?
The event involves the use of an AI system (a large language model) in a medical diagnostic context, but the article only reports on its diagnostic accuracy in a research setting. There is no indication of harm to patients, violation of rights, or any incident caused by the AI system. The study's findings suggest potential future applications but do not describe any actual or potential harm. Therefore, this is best classified as Complementary Information, providing context and understanding of AI capabilities in healthcare without describing an incident or hazard.
Thumbnail Image

Accuracy of a Generative Artificial Intelligence Model in a Complex Diagnostic Challenge

2024-01-05
jamanetwork.com
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (GPT-4) used for diagnostic reasoning, which fits the definition of an AI system. However, the study is an evaluation of the AI's diagnostic accuracy in a controlled, retrospective setting using published clinical cases. There is no mention of any injury, rights violation, disruption, or other harm caused by the AI system's use. Nor does the article suggest a plausible future harm or hazard from the AI system. Instead, it provides data on performance and discusses potential future research directions. This aligns with Complementary Information, as it supports understanding of AI's role and capabilities without reporting an incident or hazard.
Thumbnail Image

ChatGPT misdiagnosed 8 in 10 pediatric cases, study finds

2024-01-04
NewsNation
Why's our monitor labelling this an incident or hazard?
The study explicitly involves an AI system (ChatGPT) used for medical diagnosis, a task clearly within the scope of AI systems as defined. The high error rate in diagnoses implies a direct risk of harm to patient health if the AI's outputs are used in clinical decision-making. Although the study itself is an evaluation and does not report actual patient harm, the diagnostic errors represent realized harm potential in a critical domain (health). The researchers' recommendation for further analysis underscores the significance of these errors. Hence, this event meets the criteria for an AI Incident due to the AI system's use leading to diagnostic errors that constitute harm to health.
Thumbnail Image

ChatGPT found to have very low success rate in diagnosing pediatric case studies

2024-01-04
Medical Xpress - Medical and Health News
Why's our monitor labelling this an incident or hazard?
The article clearly involves an AI system (ChatGPT) being used in a diagnostic task, which fits the definition of an AI system. The study assesses the AI's diagnostic accuracy and finds it lacking, but there is no indication that the AI's use has directly or indirectly caused harm to patients or others. The researchers explicitly state that ChatGPT is not ready for diagnostic use, implying no deployment in clinical settings causing harm. Therefore, this is not an AI Incident. It also does not describe a plausible future harm scenario or risk of harm from the AI's use, as the AI was tested in a controlled research context without causing harm, so it is not an AI Hazard. The article provides information about the AI system's limitations and potential future applications, which enhances understanding of AI capabilities and limitations in healthcare. This fits the definition of Complementary Information, as it supports ongoing assessment of AI impacts without reporting a new harm or risk event.
Thumbnail Image

ChatGPT missed 8 in 10 pediatric diagnoses, study finds

2024-01-03
Hospital Review
Why's our monitor labelling this an incident or hazard?
ChatGPT is an AI system (a large language model) used here for pediatric diagnosis tasks. The study shows that its use led to a high rate of misdiagnoses, which could directly or indirectly cause harm to patients' health if such AI outputs were used in real clinical decision-making. Although the article does not report actual patient harm occurring, the demonstrated misdiagnosis rate indicates a significant risk of harm from using this AI system in pediatric diagnosis. Therefore, this event qualifies as an AI Incident due to the direct link between AI use and potential harm to health, as the AI system's outputs failed to meet clinical accuracy requirements, posing a real risk to patient safety.
Thumbnail Image

ChatGPT Misdiagnosed Most Pediatric Cases

2024-01-02
MedPage Today
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (ChatGPT 3.5, a large language model-based chatbot) used for medical diagnosis. The AI system's use directly led to incorrect diagnoses in 83% of pediatric cases tested, which constitutes a direct or indirect harm to health (harm category a). The study's findings demonstrate the AI system's malfunction or inadequacy in this context, which could lead to injury or harm if relied upon clinically. Hence, this event meets the criteria for an AI Incident as the AI system's use has directly led to harm (or at least a high risk of harm) in health diagnosis.
Thumbnail Image

ChatGPT Struggles with Pediatric Diagnoses, Revealing Limitations in AI Healthcare - EconoTimes

2024-01-03
EconoTimes
Why's our monitor labelling this an incident or hazard?
The event explicitly involves an AI system (ChatGPT, a large language model) used for medical diagnosis. The study shows that the AI system's outputs were often incorrect, which can directly harm patients by leading to misdiagnosis and inappropriate treatment. This fits the definition of an AI Incident because the AI system's use has directly led to harm to health (harm category a). Although the article does not describe specific patient outcomes, the misdiagnoses imply a significant risk of harm. Therefore, this is an AI Incident due to the realized harm from the AI system's malfunction or limitations in clinical use.
Thumbnail Image

ChatGPT is a disaster for diagnosing diseases in children

2024-01-04
Bullfrag
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (ChatGPT) used for medical diagnosis, a task clearly within AI capabilities. The study shows that the AI system's outputs were incorrect or incomplete in the majority of cases, which if used in real clinical practice could lead to injury or harm to patients. This meets the definition of an AI Incident because the AI system's use directly led to harm (diagnostic errors with potential health consequences). The article also references warnings from WHO about AI risks in healthcare, reinforcing the concern about harm. Therefore, this event is classified as an AI Incident.
Thumbnail Image

Estudio evidencia errores de ChatGPT a la hora de diagnosticar enfermedades en niños

2024-01-05
El Tiempo
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (ChatGPT) used for medical diagnosis, a high-stakes application affecting health. The study demonstrates that ChatGPT failed in 83% of cases, indicating a significant risk of misdiagnosis if used in practice. While no actual patient harm is reported, the potential for harm is credible and significant, meeting the definition of an AI Hazard. The article also references WHO concerns about AI in healthcare, reinforcing the plausibility of future harm. Since harm is not reported as having occurred, this is not an AI Incident. The event is not merely complementary information because it reports new findings about AI performance and associated risks, not just updates or governance responses.
Thumbnail Image

INTELIGENCIA ARTIFICIAL ChatGPT no está preparado para el diagnóstico de enfermedades en niños y presenta una tasa de error del 83%

2024-01-07
La Razón
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (ChatGPT-4) used for medical diagnosis, which is a clear AI system by definition. The study shows that the AI system's outputs are highly inaccurate, which could plausibly lead to harm (misdiagnosis) if used in real clinical settings. Although no direct harm is reported, the potential for harm is credible and significant, especially in sensitive pediatric cases. The article also references warnings from WHO about AI-generated clinical misinformation. Since no actual harm is documented but the risk is clear and plausible, the event fits the definition of an AI Hazard rather than an AI Incident or Complementary Information.
Thumbnail Image

ChatGPT es un desastre para diagnosticar enfermedades en niños

2024-01-04
Hipertextual
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (ChatGPT) used for medical diagnosis, a task clearly within AI capabilities. The study shows that the AI system failed in the majority of cases, which directly implies harm to health if such diagnoses were used in practice. The harm is realized and not hypothetical, as incorrect diagnoses can lead to mistreatment or delayed treatment. The involvement is in the use of the AI system for diagnosis, and the harm is injury or harm to health (a). Hence, this is an AI Incident.
Thumbnail Image

Nada es perfecto: ChatGTP aún tiene muchos problemas para detectar ciertas enfermedades

2024-01-05
FayerWayer
Why's our monitor labelling this an incident or hazard?
The article involves an AI system (ChatGPT) used in a medical diagnostic context. The study shows that ChatGPT often fails to provide correct diagnoses, which could plausibly lead to harm to patients if relied upon for medical decisions. However, there is no indication that any injury or harm has actually occurred yet. The article also references WHO warnings about AI risks in healthcare. Therefore, this event fits the definition of an AI Hazard, as the AI system's use could plausibly lead to harm, but no direct or indirect harm has been reported as having occurred.
Thumbnail Image

Científicos descubren errores en diagnósticos pediátricos de ChatGPT - Diario Primicia

2024-01-06
Diario Primicia
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (ChatGPT, a large language model) used in medical diagnosis, which is an AI system by definition. However, the study reports diagnostic errors without evidence of actual harm or injury to patients. The article emphasizes the importance of clinical expertise and suggests AI as a complementary tool, not a replacement. There is no mention of realized harm, disruption, or rights violations. The focus is on reporting research findings and potential future uses, which aligns with Complementary Information. Hence, the classification is Complementary Information rather than Incident or Hazard.
Thumbnail Image

Estudio evidencia errores de ChatGPT a la hora de diagnosticar enfermedades en niños

2024-01-07
El Economista
Why's our monitor labelling this an incident or hazard?
ChatGPT, an AI system, was used to generate medical diagnoses for children, and the study found it failed in 83% of cases, indicating a direct link between AI use and potential harm to patient health. The AI's diagnostic errors represent a failure in its use that could lead to injury or harm to persons, fulfilling the criteria for an AI Incident. The article does not describe hypothetical or potential harm only, but actual poor performance with implications for health risks. Therefore, this is an AI Incident rather than a hazard or complementary information.
Thumbnail Image

Revelan que ChatGPT diagnostica erróneamente la mayoría de los casos pediátricos - EL PAÍS VALLENATO

2024-01-04
ElPaisVallenato.com
Why's our monitor labelling this an incident or hazard?
ChatGPT is an AI system (a large language model-based chatbot) used here for medical diagnosis. The study shows that its diagnostic outputs are mostly incorrect, which constitutes a malfunction or misuse of the AI system in a critical health context. This misdiagnosis can lead to injury or harm to patients (harm to health), fulfilling the criteria for an AI Incident. Although the article suggests the AI should be used only as a complementary tool, the high error rate implies realized harm or at least a significant risk of harm in clinical use, thus qualifying as an AI Incident rather than a hazard or complementary information.
Thumbnail Image

Se descubrió que ChatGPT tiene una tasa de éxito muy baja en el diagnóstico de estudios de casos pediátricos - Notiulti

2024-01-04
Notiulti
Why's our monitor labelling this an incident or hazard?
The article explicitly involves an AI system (ChatGPT, a large language model) used in a medical diagnostic context. The study found that ChatGPT's diagnostic accuracy was low, which implies a risk of harm if used for diagnosis. However, the article does not report any actual harm occurring from ChatGPT's use, only the potential for harm due to poor diagnostic performance. Therefore, this constitutes an AI Hazard, as the AI system's use could plausibly lead to harm (misdiagnosis) but no incident has yet occurred. The article also suggests potential future improvements and alternative uses, but these do not change the classification.
Thumbnail Image

El chatbot de IA ChatGPT diagnosticó erróneamente 8 de cada 10 casos de salud pediátrica, según un estudio - Notiulti

2024-01-04
Notiulti
Why's our monitor labelling this an incident or hazard?
The event involves the use of an AI system (ChatGPT) in a medical diagnostic context, where its outputs (diagnoses) were incorrect in a large majority of cases. This constitutes a malfunction or misuse of the AI system leading to potential harm to health (harm category a). Although the study is experimental and does not report direct patient harm, the high error rate in diagnostic outputs implies a significant risk of harm if such AI outputs were relied upon in clinical settings. Therefore, this qualifies as an AI Incident due to the direct link between the AI system's malfunction (incorrect diagnoses) and potential harm to health.
Thumbnail Image

ChatGPT no está listo para las 'pruebas' médicas ya que no logra equivocarse en el 83% de los casos - Notiulti

2024-01-07
Notiulti
Why's our monitor labelling this an incident or hazard?
ChatGPT-4 is an AI system (a large language model) used here for medical diagnosis. The study shows that its use led to a high rate of incorrect or incomplete diagnoses, which can cause harm to patients' health if relied upon. This is a direct link between the AI system's use and harm to health, fulfilling the criteria for an AI Incident. The article also notes potential complementary use, but the main focus is on the realized harm from poor diagnostic performance.
Thumbnail Image

ChatGPT erra mais de 80% de diagnósticos de pediatria em estudo

2024-01-04
Tecnologia
Why's our monitor labelling this an incident or hazard?
The article describes a study assessing the AI system's diagnostic performance, revealing poor accuracy in pediatric diagnosis. However, it does not report any realized harm, injury, or violation resulting from the AI's use. The AI's limitations and potential risks are discussed, but no incident of harm is documented. Therefore, this is not an AI Incident. It also does not describe a plausible future harm event or credible risk scenario beyond general performance limitations, so it is not an AI Hazard. The article provides information that contextualizes the AI system's capabilities and limitations, which fits the definition of Complementary Information.
Thumbnail Image

ChatGPT erra mais de 80% de diagnósticos de pediatria em estudo

2024-01-04
Canaltech
Why's our monitor labelling this an incident or hazard?
The article explicitly discusses the performance of an AI system (ChatGPT GPT-4) in pediatric diagnosis, confirming AI system involvement. However, it does not report any direct or indirect harm resulting from the AI's use, only poor diagnostic accuracy. There is no mention of actual injury, mismanagement, or rights violations caused by the AI. The article also highlights that AI is intended as a complementary tool, not a replacement for medical professionals. Thus, it does not meet the criteria for an AI Incident or AI Hazard. Instead, it provides important complementary information about AI limitations and potential risks in clinical settings, which is valuable for understanding the AI ecosystem and guiding future risk management.
Thumbnail Image

ChatGPT erra 83% dos diagnósticos de doenças em crianças - Tecnoblog

2024-01-07
Tecnoblog
Why's our monitor labelling this an incident or hazard?
The event involves an AI system (ChatGPT) used for medical diagnosis, and the study reveals significant diagnostic errors. However, there is no indication that these errors have directly or indirectly caused harm to patients or led to any adverse outcomes. The article emphasizes the AI's current limitations and the need for improvement rather than reporting an incident of harm. Therefore, this is not an AI Incident. It also does not describe a plausible future harm scenario beyond the current findings, so it is not an AI Hazard. The article primarily provides research findings and context about AI performance in healthcare, which fits the definition of Complementary Information.
Thumbnail Image

Se o ChatGPT fosse um pediatra, ele estaria 83% das vezes errado ao fazer diagnósticos de crianças | Exame

2024-01-08
Exame
Why's our monitor labelling this an incident or hazard?
ChatGPT-4 is an AI system (a large language model) used here for medical diagnosis. The study shows that its use led to a high rate of incorrect diagnoses, which constitutes harm to health (a). Although the article does not describe actual patient harm occurring, the direct use of the AI system for diagnosis with such low accuracy implies a direct link to potential or actual harm. Given the AI system's diagnostic errors, this qualifies as an AI Incident due to the direct or indirect harm to health from its use in clinical diagnosis.
Thumbnail Image

ChatGPT erra diagnóstico na análise de 87% de casos pediátricos, revela novo estudo

2024-01-08
Folha - PE
Why's our monitor labelling this an incident or hazard?
The event involves the use of an AI system (ChatGPT, a large language model) in a medical diagnostic context. The study shows that the AI system made incorrect diagnoses in 83% of pediatric cases tested, which directly relates to potential injury or harm to patients' health (harm category a). The AI system's outputs were evaluated against expert diagnoses, confirming the AI's errors. Although the article does not describe actual patient harm occurring, the AI's diagnostic errors constitute a direct or indirect cause of harm if used clinically. This meets the definition of an AI Incident, as the AI system's use has directly led to significant harm (diagnostic errors) or at least a clear risk of harm. The article also discusses the limitations and potential future improvements but does not frame this as a mere hazard or complementary information. Hence, the classification is AI Incident.