
The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.
Multiple studies led by US and Canadian researchers found that popular AI chatbots, including ChatGPT, Gemini, Grok, and others, frequently provide inaccurate or incomplete medical information. Around half of their responses to health-related queries were problematic, raising concerns about potential harm to users who rely on these AI systems for medical advice.[AI generated]
Why's our monitor labelling this an incident or hazard?
The article explicitly involves AI systems (chatbots powered by large language models) whose use has directly led to problematic medical advice that could cause injury or harm to users' health, fulfilling the criteria for an AI Incident. The harm is realized or ongoing, as the chatbots are widely used by adults for health queries, and the study documents a high rate of problematic responses that could mislead users. The event is not merely a potential risk (hazard) or a response/update (complementary information), but a clear case where AI use has caused or is causing harm, justifying classification as an AI Incident.[AI generated]