
The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.
A study by LLYC found that major AI language models, including ChatGPT, Gemini, Grok, Mistral, and Llama, systematically reinforce gender stereotypes. The AI systems label young women as "fragile," recommend external validation, and steer their aspirations toward traditional roles, perpetuating inequality and harming self-perception among women aged 16-25 in 12 countries.[AI generated]
Why's our monitor labelling this an incident or hazard?
The event involves AI systems (algorithms and large language models) whose use has directly led to harm by validating and amplifying gender biases and stereotypes, negatively affecting young women and broader society. This constitutes harm to communities and a violation of rights, fitting the definition of an AI Incident. The article provides evidence of realized harm through AI outputs influencing social attitudes and behaviors, not just potential harm. Therefore, it is classified as an AI Incident rather than a hazard or complementary information.[AI generated]