
The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.
Anthropic researchers have identified a vulnerability called 'many-shot jailbreaking' that exploits large context windows in advanced language models, allowing attackers to bypass safety filters and elicit harmful outputs, such as instructions for making weapons. This technique affects models from Anthropic, OpenAI, and Google DeepMind, posing significant safety risks.[AI generated]



































