Researchers Expose Major Flaws in AI Chatbot Safety Guardrails

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Researchers from Carnegie Mellon University and the Center for AI Safety demonstrated that safety measures in leading AI chatbots (ChatGPT, Bard, Claude) can be easily bypassed using automated adversarial attacks. These methods enable the chatbots to generate harmful content, including misinformation and dangerous instructions, revealing significant vulnerabilities in current AI safety systems.[AI generated]

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (ChatGPT and Bard) and their safety mechanisms being circumvented, leading to the generation of harmful content. This constitutes a direct or indirect harm to communities through dissemination of hate speech and disinformation, fitting the definition of an AI Incident. The researchers' findings reveal an existing vulnerability that has already been exploited or can be exploited, causing realized harm or imminent risk of harm.[AI generated]

AI principles

Robustness & digital securitySafetyTransparency & explainabilityAccountabilityHuman wellbeingDemocracy & human autonomy

Industries

Media, social platforms, and marketingDigital securityIT infrastructure and hostingGeneral or personal use

Affected stakeholders

ConsumersGeneral public

Harm types

Public interestPhysical (injury)Reputational

Severity

AI incident

Business function:

Citizen/customer service

AI system task:

Interaction support/chatbotsContent generation

Articles about this incident or hazard

Thumbnail Image

AI researchers say they've found a way to jailbreak Bard and ChatGPT By Cointelegraph

2023-07-28

Investing.com

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (ChatGPT and Bard) and their safety mechanisms being circumvented, leading to the generation of harmful content. This constitutes a direct or indirect harm to communities through dissemination of hate speech and disinformation, fitting the definition of an AI Incident. The researchers' findings reveal an existing vulnerability that has already been exploited or can be exploited, causing realized harm or imminent risk of harm.

Thumbnail Image

Researchers poke holes in safety controls of ChatGPT and other chatbots

2023-07-28

Economic Times

Why's our monitor labelling this an incident or hazard?

The event involves the use and malfunction (circumvention) of AI systems' safety mechanisms, directly leading to the generation and potential dissemination of harmful content, which constitutes harm to communities and public safety. The researchers' demonstration shows realized vulnerabilities that have already been exploited to produce harmful outputs, fulfilling the criteria for an AI Incident. The involvement of AI systems is explicit, and the harms (disinformation, hate speech, instructions for dangerous acts) are clearly articulated and ongoing risks. Therefore, this qualifies as an AI Incident rather than a hazard or complementary information.

Thumbnail Image

Researchers poke holes in safety controls of ChatGPT and other chatbots

2023-07-27

Economic Times

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (chatbots) whose safety controls have been bypassed, resulting in the generation of harmful content such as disinformation and instructions for dangerous acts. This constitutes direct harm to communities and public safety through the dissemination of toxic and false information. The researchers' findings reveal realized harms due to the AI systems' outputs, not just potential risks. Although mitigation efforts are underway, the harm is ongoing and the AI systems' malfunction (inadequate guardrails) is pivotal. Therefore, this qualifies as an AI Incident rather than a hazard or complementary information.

Thumbnail Image

Experts poke holes in safety controls of ChatGPT, chatbots - Times of India

2023-07-28

The Times of India

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (chatbots using large language models) and their safety guardrails. The researchers' method allows bypassing these guardrails to generate harmful content, which could plausibly lead to harms such as spreading disinformation, hate speech, and instructions for dangerous activities. No actual harm or incident is reported as having occurred yet; the focus is on the potential for harm and the vulnerability of the systems. This fits the definition of an AI Hazard, as the event plausibly could lead to an AI Incident but has not yet done so. The companies' responses and ongoing efforts to improve guardrails constitute complementary information but do not change the classification of the main event as a hazard.

Thumbnail Image

AI researchers say they've found 'virtually unlimited' ways to bypass Bard and ChatGPT's safety rules

2023-07-28

Business Insider

Why's our monitor labelling this an incident or hazard?

The researchers have identified automated adversarial attacks that can circumvent AI safety measures, potentially leading to the generation of harmful content. This represents a credible risk of harm (e.g., misinformation, hate speech) that could materialize if exploited, even though no direct harm is documented in the article. Therefore, this situation fits the definition of an AI Hazard, as it plausibly could lead to an AI Incident involving harm to communities or individuals through misuse of AI systems.

Thumbnail Image

AI researchers have found a way to jailbreak Bard and ChatGPT

2023-07-28

MoneyControl

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions AI systems (Bard and ChatGPT) and details how their safety mechanisms can be circumvented via automated adversarial attacks, leading to the production of harmful or misleading content. This directly relates to harm to communities through misinformation and potentially harmful outputs. Since the exploit is demonstrated and the harm is realizable, this qualifies as an AI Incident rather than a mere hazard or complementary information.

Thumbnail Image

Researchers prove ChatGPT and other big bots can - and will - go to the dark side

2023-07-28

TechRadar

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (LLMs) and their vulnerabilities that allow harmful outputs to be generated, which can lead to harm to communities through misinformation and hate speech. The researchers demonstrated that these AI systems can be manipulated to bypass safety filters, which is a malfunction or misuse of the AI systems leading to potential harm. Since harmful content generation is already occurring or can occur due to these vulnerabilities, this constitutes an AI Incident. The article also notes that the companies are working on mitigation, but the harm or risk of harm is materialized and central to the report, not just a future possibility or complementary information.

Thumbnail Image

Researchers find multiple ways to bypass AI chatbot safety rules

2023-07-29

The Hill

Why's our monitor labelling this an incident or hazard?

The event involves AI systems explicitly (AI chatbots like ChatGPT, Bard, Claude) and their use and misuse. The research shows that the AI systems can be manipulated to produce harmful content, which constitutes harm to communities and individuals (harm category d). The harm is realized or highly likely given the examples provided (e.g., instructions on making bombs, identity theft). Therefore, this qualifies as an AI Incident because the AI systems' malfunction or misuse has directly led to harm or the facilitation of harm. The event is not merely a potential risk (hazard) or a complementary information update but a documented capability to cause harm through AI misuse.

Thumbnail Image

AI researchers say they've found a way to jailbreak Bard and ChatGPT

2023-07-28

Cointelegraph

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (large language models like ChatGPT and Bard) and discusses a discovered method to circumvent their safety features, enabling harmful content generation. While no actual harm is reported as having occurred, the researchers emphasize the ease and scalability of these adversarial attacks and the inability to fully prevent them, indicating a credible risk of future harm. The potential harms include spreading hate speech, disinformation, and toxic material, which align with harms to communities and violations of rights. Since the harm is plausible but not yet realized, the event fits the definition of an AI Hazard rather than an AI Incident. The article also includes responses from AI developers acknowledging the issue, but the main focus remains on the risk posed by the vulnerability rather than on mitigation or remediation, so it is not Complementary Information.

Thumbnail Image

How researchers broke ChatGPT and what it could mean for future AI development

2023-07-27

ZDNet

Why's our monitor labelling this an incident or hazard?

The article describes a research study where adversarial attacks were used to bypass AI chatbots' safety filters, leading to the generation of harmful content. The AI systems (ChatGPT, Bard, Claude) are explicitly involved, and the event concerns their use and the discovery of vulnerabilities. Although no direct harm is reported as having occurred yet, the demonstrated ability to generate harmful content indicates a plausible risk of future harm, such as misinformation spread and hate speech, which are harms to communities. Therefore, this qualifies as an AI Hazard rather than an AI Incident, since the harm is potential and not yet realized.

Thumbnail Image

Researchers poke holes in safety controls of ChatGPT and other chatbots

2023-07-28

The Straits Times

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (chatbots using large language models) and their safety controls. The researchers' findings reveal a method to bypass these controls, enabling the generation of harmful content. This directly relates to the use and malfunction (inability to fully prevent harmful outputs) of AI systems leading to potential harm. Since the harmful content generation is demonstrated and the risk of flooding the internet with false and dangerous information is highlighted, this constitutes an AI Incident due to realized harm potential and demonstrated exploitation of AI safety failures.

Thumbnail Image

Google: Researchers suggest ChatGPT, Google Bard can answer 'dangerous' questions despite safety measures

2023-07-28

Gadget Now

Why's our monitor labelling this an incident or hazard?

The AI systems involved are explicitly mentioned (ChatGPT, Google Bard, Claude) and their use is central to the event. The researchers' demonstration shows that the AI systems' safety measures can be circumvented, leading to the generation of harmful content that can cause real-world harm (e.g., instructions for making bombs). This meets the definition of an AI Incident because the AI systems' use has directly led to harm or the facilitation of harm through the generation of dangerous information. The event is not merely a potential risk (hazard) but documents realized vulnerabilities that have already been exploited in testing, indicating an existing incident of harm potential that is materialized in the AI systems' outputs. The responses from the companies confirm the issue and ongoing mitigation efforts, but the core event is the demonstrated bypass of safety measures leading to harmful outputs.

Thumbnail Image

New security flaw in A.I. chatbot spells big trouble for the A.I. boom

2023-07-28

Fortune

Why's our monitor labelling this an incident or hazard?

The event involves the use and malfunction of AI systems (large language models) that have directly led to a security flaw enabling harmful outputs. The harm includes the potential for injury to communities through toxic language, misinformation, and malicious actions facilitated by AI-generated content. The researchers demonstrated that the AI systems can be manipulated to override safety guardrails, which constitutes a direct AI Incident as the harm is realized or highly likely to be realized given the demonstrated exploit. The article also discusses the broader implications and risks, but the core event is the discovery and demonstration of a security vulnerability causing or enabling harm through AI misuse.

Thumbnail Image

Researchers poke holes in safety controls of ChatGPT and other

2023-07-28

Deccan Herald

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (large language model chatbots) and details how their safety mechanisms can be bypassed to produce harmful outputs. This misuse directly leads to the generation of harmful content, which constitutes harm to communities and potentially violates rights by spreading disinformation and toxic material. The researchers' findings reveal realized vulnerabilities that have already been exploited to generate harmful content, not just potential risks. Therefore, this qualifies as an AI Incident because the AI systems' malfunction or misuse has directly led to harm through the generation of dangerous and false information.

Thumbnail Image

It's Shockingly Easy to Get Around AI Chatbot Guardrails, Researchers Find

2023-07-28

Futurism

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (ChatGPT, Bard, Claude) and their use. The researchers' demonstration shows how the AI systems' guardrails can be circumvented to produce harmful outputs, which could plausibly lead to harms such as misinformation dissemination and enabling illegal activities. Although no actual harm is reported as having occurred yet, the demonstrated ease and scalability of these attacks present a credible risk of future AI Incidents. The event is not merely general AI news or a product update, nor is it a response or governance action, so it is not Complementary Information. Hence, the classification as an AI Hazard is appropriate.

Thumbnail Image

You can make top LLMs break their own rules with gibberish

2023-07-28

TheRegister.com

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (LLMs like ChatGPT, Bard, Claude) and their development and use. The adversarial attacks described cause these AI systems to produce harmful outputs that they are designed to avoid, thus directly leading to potential harm (e.g., instructions for illegal or dangerous acts). The researchers provide evidence of these attacks working on multiple models, including commercial ones, and the article discusses the risks of widespread misuse. This meets the criteria for an AI Incident because the AI system's malfunction or misuse has directly led or could lead to harm to people and communities. The event is not merely a potential hazard or complementary information, but a documented failure with real-world implications.

Thumbnail Image

US Researchers Highlight How ChatGPT's Safety Measures Are at Risk

2023-07-28

BeInCrypto

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (ChatGPT and other LLMs) and their safety mechanisms. The researchers demonstrated that adversarial attacks can induce objectionable and potentially harmful outputs, indicating a plausible risk of harm if such misuse occurs. This fits the definition of an AI Hazard, as the event describes circumstances where AI system misuse could plausibly lead to significant harm, but no actual harm has yet occurred. The article also mentions governance responses, but the main focus is on the research findings about vulnerabilities and the potential for misuse, not on the responses themselves.

Thumbnail Image

Researchers Disable AI Chatbot Safeguards | Silicon UK Tech

2023-07-31

Silicon UK

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (ChatGPT, Bard) and details how their safeguards can be bypassed to produce harmful content. The researchers' automated jailbreaking method directly enables the AI to generate harmful outputs, which can cause injury, misinformation, or societal harm. The harm is realized or highly likely given the ability to produce dangerous instructions and misinformation. Therefore, this constitutes an AI Incident due to the direct link between AI system misuse and harm.

Thumbnail Image

ChatGPT to Bard, 'Unlimited' ways to override AI chatbots safety measures exposed

2023-07-30

HT Tech

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (ChatGPT, Bard, Claude) whose use has directly led to the generation of harmful content due to safety guardrail circumvention. This constitutes an AI Incident because the AI systems' malfunction or exploitation has caused harm to communities through misinformation and hate speech. The study's findings and company responses provide complementary context but do not negate the realized harm. Therefore, the event is best classified as an AI Incident.

Thumbnail Image

US Researchers Highlight How ChatGPT's Safety Measures Are at Risk

2023-07-28

TradingView

Why's our monitor labelling this an incident or hazard?

The researchers' discovery that ChatGPT can be forced to generate harmful content by bypassing safety measures shows a credible risk of harm, fulfilling the criteria for an AI Hazard. Since no actual harm or incident has been reported, and the main focus is on the potential for misuse and ongoing mitigation efforts, this event is best classified as an AI Hazard with complementary governance context.

Thumbnail Image

AI researchers say they've found a way to jailbreak Bard and ChatGPT

2023-07-28

TradingView

Why's our monitor labelling this an incident or hazard?

The researchers have identified a method to circumvent AI safety measures, which could plausibly lead to the generation and spread of harmful content by AI chatbots. The AI systems involved are explicitly mentioned (ChatGPT, Bard, Claude), and the vulnerability relates to their use. While no direct harm is reported as having occurred, the potential for harm is credible and significant, including hate speech, disinformation, and toxic material dissemination. This fits the definition of an AI Hazard, as the event describes circumstances where AI system use could plausibly lead to harm but does not confirm that harm has yet materialized.

Thumbnail Image

Europe and the U.S. Will Probably Regulate A.I. Differently. That Will Have Long-term Consequences for the Global Art Market | Artnet News

2023-07-26

artnet News

Why's our monitor labelling this an incident or hazard?

The article centers on policy and regulatory developments concerning AI, particularly contrasting U.S. voluntary self-regulation with the E.U.'s more stringent legislative approach. It does not report any incident where AI has directly or indirectly caused harm, nor does it describe a plausible imminent harm event. The content is primarily about governance responses and their potential long-term consequences, which fits the definition of Complementary Information rather than an AI Incident or AI Hazard.

Thumbnail Image

Researchers Poke Holes in Safety Controls of ChatGPT and Other Chatbots

2023-07-27

The New York Times

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (large language model chatbots) whose safety controls have been bypassed to produce harmful content, directly leading to the risk of harm to communities through disinformation and toxic material. The researchers' findings reveal realized vulnerabilities that have already been exploited to generate harmful outputs, constituting an AI Incident. The article describes actual misuse and malfunction of AI safety features, not just potential future harm or general AI news. Therefore, this qualifies as an AI Incident due to the direct link between AI system misuse and harm.

Thumbnail Image

Will Biden's Meetings with A.I. Companies Make Any Difference?

2023-07-24

The New Yorker

Why's our monitor labelling this an incident or hazard?

The article centers on the announcement of voluntary safety commitments by AI companies and the broader regulatory landscape, which constitutes a governance and societal response to AI risks. There is no description of an AI system causing direct or indirect harm, nor is there a specific event where AI malfunction or misuse led to harm. While it discusses potential risks and the need for regulation, it does not describe a credible imminent threat or near-miss incident. Therefore, it fits the definition of Complementary Information, as it provides context and updates on AI governance and safety efforts without reporting a new AI Incident or AI Hazard.

Thumbnail Image

Are those White House A.I. pledges worth anything?

2023-07-25

Fortune

Why's our monitor labelling this an incident or hazard?

The article does not report a concrete AI Incident or AI Hazard but rather discusses the broader context of AI governance, voluntary commitments by companies, and the challenges of ensuring AI safety and preventing harm. It includes reflections on potential future risks and the difficulty of regulating AI, but no direct or indirect harm or plausible imminent harm is described. Therefore, it fits the definition of Complementary Information as it provides supporting data and context about AI systems and governance without reporting a new incident or hazard.

Thumbnail Image

Palantir's Alex Karp promotes the A.I. military-industrial complex -- but there are many reasons to be cautious

2023-07-26

Fortune

Why's our monitor labelling this an incident or hazard?

The article centers on advocacy and strategic perspectives regarding AI's role in military applications and the tech industry's involvement with defense contracts. It references historical and current policy moves but does not report any realized harm or a specific event where AI caused or nearly caused harm. The discussion of risks is speculative and contextual rather than describing a concrete AI Incident or AI Hazard. Therefore, the content is best classified as Complementary Information, providing context and insight into the evolving AI ecosystem and governance issues related to military AI development.

Thumbnail Image

Generative A.I. could upend the workforce, McKinsey says, forcing 12 million people to switch jobs and automating away 30% of hours worked in the U.S. economy by 2030

2023-07-27

Fortune

Why's our monitor labelling this an incident or hazard?

The article does not describe any realized harm or incident caused by AI systems but rather presents a forecast of plausible future impacts of AI on employment and workforce dynamics. It highlights potential job displacement and career shifts due to AI automation and integration, which could plausibly lead to significant societal and economic changes. However, since these are projections and no actual harm or incident has yet occurred, this qualifies as an AI Hazard rather than an AI Incident. The article does not focus on responses, mitigation, or governance actions, so it is not Complementary Information. It is not unrelated because it clearly involves AI systems and their potential impact.

Thumbnail Image

Seven Artificial Intelligence Companies Agree To Biden's Standards For Safeguarding Emerging Technology - Towleroad Gay News

2023-07-25

Towleroad Gay News

Why's our monitor labelling this an incident or hazard?

The article focuses on the announcement of voluntary safeguards and standards agreed upon by leading AI companies in collaboration with the Biden administration. There is no description of any specific AI system causing harm or any incident or hazard occurring. Instead, it is about proactive governance and risk management measures, which fits the definition of Complementary Information.

Thumbnail Image

The White House Struck a Deal With A.I. Companies to Manage the Technology's Risks. Artists Say It 'Does Nothing' to Protect Them | Artnet News

2023-07-25

artnet News

Why's our monitor labelling this an incident or hazard?

The article centers on a policy agreement and the reactions from affected stakeholders, particularly artists, regarding AI's impact on intellectual property and creative industries. It references existing lawsuits and ongoing harms caused by AI systems trained on copyrighted works without consent, which constitute AI Incidents. However, the article itself does not report a new incident or hazard but rather discusses the policy response and critiques thereof. This fits the definition of Complementary Information, as it provides updates and context on AI governance and societal responses to known AI-related harms, rather than describing a new incident or plausible future harm.

Thumbnail Image

OpenAI Abruptly Shuts Down ChatGPT Plagiarism Detector -- And Educators Are Worried

2023-07-26

Observer

Why's our monitor labelling this an incident or hazard?

The AI Classifier is an AI system used for detecting AI-generated text, but its discontinuation is due to poor performance rather than causing harm. The article highlights concerns about false positives and the ineffectiveness of current AI plagiarism detectors, which is a significant issue for education but does not constitute an AI Incident or AI Hazard as no harm has been caused or is plausibly imminent. The event is best classified as Complementary Information because it provides context and updates on the state of AI detection tools and their limitations, informing stakeholders about challenges in AI governance and use in education.

Thumbnail Image

Does White House Agreement Really Safeguard Us as A.I. Threatens Our Privacy?

2023-07-26

CBN.com - The Christian Broadcasting Network

Why's our monitor labelling this an incident or hazard?

The article centers on policy and governance discussions about AI risks and safeguards without describing any realized harm or specific incident involving AI systems. It reports on voluntary commitments and calls for transparency and regulation but does not document an AI Incident or an AI Hazard. Therefore, it fits the category of Complementary Information, providing context and updates on societal and governance responses to AI-related concerns.

President Biden Pressures AI Companies to Safeguard Developments -

2023-07-24

mxdwn Music

Why's our monitor labelling this an incident or hazard?

The article discusses voluntary commitments by major AI companies to adopt safety and security standards, prompted by government pressure. There is no mention of any realized harm, malfunction, or misuse of AI systems. The event is about governance and risk mitigation efforts in response to public concerns and potential future risks, making it complementary information rather than an incident or hazard.

Thumbnail Image

Researchers Poke Holes In Safety Controls Of ChatGPT And Other Chatbots | RMOL

2023-07-27

RMOL

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (ChatGPT, Claude, Google Bard) and their safety controls. The researchers' findings reveal vulnerabilities that allow the generation of harmful outputs, indicating a plausible risk of harm to communities through misinformation and toxic content. Although no specific harm is reported as having occurred yet, the demonstrated ability to circumvent safety measures constitutes a credible potential for harm, fitting the definition of an AI Hazard.

Thumbnail Image

ChatGPT: cientistas desbloqueiam travas de segurança da IA - 02/08/2023 - Tec - Folha

2023-08-02

Folha de S.Paulo

Why's our monitor labelling this an incident or hazard?

The event involves the use and testing of AI systems (large language models) and reveals a security vulnerability that allows the AI to produce harmful content, which is a direct violation of safety protocols designed to prevent harm. The AI's malfunction or exploitation leads to the generation of dangerous instructions, which constitutes a direct harm potential to communities and possibly human rights. Although no actual harm (e.g., physical injury) is reported as having occurred yet, the demonstrated ability to bypass safety measures and produce harmful content is a clear and present risk. This fits the definition of an AI Incident because the AI system's malfunction or exploitation has directly led to the generation of harmful outputs, which can cause harm if acted upon. The article also mentions that the companies have been informed and are working on mitigation, but the core event is the discovery and demonstration of the vulnerability and its exploitation potential.

Thumbnail Image

OpenAI admite que desempenho do GPT-4 pode piorar em algumas tarefas - SAPO Tek

2023-07-31

SAPO Tek

Why's our monitor labelling this an incident or hazard?

The event involves an AI system (GPT-4) and its development/use, specifically performance changes. However, the article does not describe any realized harm or incident caused by these changes, nor does it indicate plausible future harm. It is primarily an update on the AI system's performance and OpenAI's response, which fits the definition of Complementary Information rather than an Incident or Hazard.

Thumbnail Image

There are 'virtually unlimited' ways to bypass Bard and ChatGPT's safety rule, AI researchers say, and they're not sure how to fix it

2023-08-02

Business Insider

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (large language models powering chatbots) and discusses a vulnerability in their safety mechanisms that could plausibly lead to harms such as generation of hateful or illegal content. Since no actual harm has been reported but the potential for harm is credible and significant, this qualifies as an AI Hazard. The researchers' warning about 'virtually unlimited' bypasses and the lack of a known fix underscores the plausible risk of future AI incidents stemming from these vulnerabilities.

Thumbnail Image

A New Attack Impacts ChatGPT -- and No One Knows How to Stop It

2023-08-01

Wired

Why's our monitor labelling this an incident or hazard?

The event involves the use and malfunction of AI systems (chatbots) where adversarial attacks cause the AI to produce harmful outputs, directly leading to potential harm such as dissemination of illegal or dangerous information. This fits the definition of an AI Incident because the AI system's malfunction (being tricked into generating harmful content) has directly led to harm or the risk of harm. The harm is realized in the generation of disallowed content, which can cause injury to individuals or communities (harm to health, safety, or societal harm). The researchers' findings and the companies' partial mitigations confirm the AI systems' role in causing this harm. Therefore, this event is best classified as an AI Incident.

Thumbnail Image

Researchers poke holes in safety controls of ChatGPT and other chatbots

2023-08-05

The Denver Post

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (large language model chatbots) whose use has directly led to the generation of harmful content when their safety controls are bypassed. This constitutes harm to communities through the spread of disinformation and toxic material, fulfilling the criteria for an AI Incident. The article describes realized vulnerabilities and demonstrated misuse rather than just potential risks, so it is not merely a hazard or complementary information. Therefore, this is classified as an AI Incident.

Thumbnail Image

Researchers figure out how to make AI misbehave, serve up prohibited content

2023-08-02

Ars Technica

Why's our monitor labelling this an incident or hazard?

The event involves the use and malfunction of AI systems (large language models/chatbots) that have been manipulated via adversarial prompts to produce harmful content. This manipulation directly leads to violations of content policies and could result in harm to individuals or communities if such harmful outputs are used maliciously. The researchers' demonstration shows realized harm potential through the AI's outputs, and the companies' responses indicate ongoing risk. Therefore, this qualifies as an AI Incident due to the direct link between AI misuse and the generation of harmful content.

Thumbnail Image

Researchers Discover New AI Attacks Can Make ChatGPT, Other AI Allow Harmful Prompts

2023-08-01

Tech Times

Why's our monitor labelling this an incident or hazard?

The event involves AI systems explicitly (ChatGPT, Bard, Claude) and their use is directly implicated in a security vulnerability that can be exploited to bypass safety constraints. While no actual harm is reported as having occurred, the described adversarial attacks plausibly could lead to significant harms such as promoting hate speech or illegal instructions, which fall under violations of rights and harm to communities. The researchers' findings and the lack of a current fix indicate a credible risk of future incidents. Hence, this is best classified as an AI Hazard rather than an Incident or Complementary Information.

Thumbnail Image

Adversarial Attacks: The Achilles Heel Of AI Chatbots

2023-08-03

Ubergizmo

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (large language models powering chatbots) whose use has directly led to harmful outputs due to adversarial attacks. The generation of harmful or illegal instructions constitutes harm to communities and potentially violates legal and ethical norms. The researchers demonstrated actual exploitation causing harmful outputs, not just a theoretical risk, so this is an AI Incident rather than a mere hazard. The article also notes ongoing challenges in fully mitigating this vulnerability, underscoring the significance of the harm and the AI system's pivotal role.

Thumbnail Image

ChatGPT vulnerable to attacks - The Bobr Times

2023-08-03

bobrtimes.com

Why's our monitor labelling this an incident or hazard?

The article explicitly involves AI systems (large language models) and discusses a vulnerability in their use that can be exploited to produce harmful outputs, including misinformation and biased results. While no specific incident of harm is reported as having occurred, the described adversarial attacks pose a credible risk of causing harm to communities through misinformation and biased content dissemination. The study's findings indicate a fundamental weakness that complicates securing these AI systems, making the risk plausible and significant. Hence, this event fits the definition of an AI Hazard rather than an AI Incident or Complementary Information.

Thumbnail Image

The Rise of ChatGPT and Its Cohorts: A Menace to Cybersecurity - Softonic

2023-08-04

Softonic

Why's our monitor labelling this an incident or hazard?

The event involves AI systems explicitly (generative AI chatbots) and their use and development. The adversarial attacks described could plausibly lead to harm to users' security and privacy, which falls under harm to persons or communities. Since the harm is potential but credible and the article warns about the difficulty of patching these vulnerabilities, this constitutes an AI Hazard rather than an AI Incident. There is no indication that harm has already occurred, so it is not an Incident. The article is not merely complementary information as it focuses on the threat itself rather than responses or updates.

Thumbnail Image

AI chatbots like ChatGPT could be security nightmares - and experts are trying to contain the chaos

2023-08-04

TechRadar

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (generative AI chatbots) and their use, specifically focusing on security vulnerabilities that could be exploited via automated adversarial attacks. Although no direct harm has yet been reported, the article clearly outlines a credible risk that these attacks could lead to harms including privacy violations and unsafe outputs. This fits the definition of an AI Hazard, as the development and use of these AI systems could plausibly lead to an AI Incident. The article also includes responses from AI developers, but the main focus is on the potential threat rather than on mitigation or governance responses, so it is not Complementary Information. Therefore, the classification is AI Hazard.

Thumbnail Image

۱۰ شغلی که بیشتر در معرض خطر قرار دارند

2023-08-06

خبرگزاری باشگاه خبرنگاران | آخرین اخبار ایران و جهان | YJC

Why's our monitor labelling this an incident or hazard?

The article focuses on the potential future impact of AI on jobs, based on data analysis and expert assessment. It does not describe any actual event where AI has caused harm or disruption, nor does it report on a malfunction or misuse of AI systems. Therefore, it fits the definition of an AI Hazard, as it highlights credible risks of AI leading to harm (job displacement) in the future, but no current incident has occurred.

Thumbnail Image

۱۰ شغلی که بیشتر در معرض خطر قرار دارند

2023-08-06

خبرگردون | پایگاه خبری تحلیلی - خبرگردون

Why's our monitor labelling this an incident or hazard?

The article does not describe any specific AI system causing harm or malfunction, nor does it report any realized harm or incident. Instead, it provides an analysis or forecast of which jobs are more likely to be affected by AI in the future, implying a plausible risk but no current incident. Therefore, it fits the definition of an AI Hazard, as it highlights plausible future harm due to AI's impact on employment.

Thumbnail Image

هوش مصنوعی برای فقرا

2023-08-09

روزنامه دنیای اقتصاد

Why's our monitor labelling this an incident or hazard?

The article does not report any specific AI Incident or AI Hazard. It does not describe any realized harm caused by AI systems, nor does it identify a particular event or circumstance where AI use or malfunction could plausibly lead to harm. Instead, it provides a comprehensive discussion on the societal implications, opportunities, and challenges of AI in poorer countries, including regulatory and infrastructural considerations. This fits the definition of Complementary Information, as it enhances understanding of AI's broader impacts and governance issues without reporting a new incident or hazard.

Thumbnail Image

دانشگاهی که برای استفاده از هوش مصنوعی قانون تعیین کرد

2023-08-08

ایسنا

Why's our monitor labelling this an incident or hazard?

The event involves AI systems in the context of their use and governance, but no direct or indirect harm has occurred. The university's guidelines aim to mitigate potential risks and promote responsible use, which is a governance and societal response to AI-related challenges. Therefore, this is Complementary Information as it provides context and updates on AI governance without reporting a specific AI Incident or AI Hazard.

Thumbnail Image

افزایش نگرانی هنرمندان از جایگزین شدن با "هوش مصنوعی"؛ ایجاد کارگروه ویژه در دیزنی

2023-08-08

صدای آمریکا فارسی

Why's our monitor labelling this an incident or hazard?

The article involves AI systems in the context of their use and potential impact on employment and creative industries. While it describes concerns about AI potentially replacing human artists and the formation of a task force to explore AI use, it does not report any realized harm or incident caused by AI. The concerns and negotiations reflect plausible future risks but no direct or indirect harm has yet occurred as per the article. Therefore, this is best classified as Complementary Information, providing context on societal and governance responses to AI's impact in the entertainment industry.

Thumbnail Image

خطرساخت بمب و موادمخدر با هوش‌مصنوعی

2023-08-07

همشهری آنلاین

Why's our monitor labelling this an incident or hazard?

The event involves the use and potential misuse of AI systems (large language models) to generate harmful content that can lead to significant harm, including threats to public safety (e.g., bomb and drug manufacturing instructions). Although no specific incident of harm is reported, the demonstrated ability to bypass safeguards and produce dangerous outputs indicates a credible risk of harm occurring. Therefore, this qualifies as an AI Hazard because the AI systems' misuse could plausibly lead to harm, but no actual harm is described as having occurred yet.

Thumbnail Image

۱۰ شغلی که احتمالا در آینده نابود می‌شوند

2023-08-07

آفتاب

Why's our monitor labelling this an incident or hazard?

The article clearly involves AI systems as it discusses AI's ability to perform tasks such as data analysis and information processing that could replace human jobs. However, it only addresses the potential future impact of AI on employment, without any direct or indirect harm having occurred yet. Therefore, it fits the definition of an AI Hazard, as it plausibly could lead to harm (job loss) in the future but does not describe an actual incident or realized harm. It is not complementary information because it is not updating or responding to a past incident, nor is it unrelated as it directly concerns AI's impact on jobs.

Thumbnail Image

تعیین مقررات برای استفاده از هوش مصنوعی درفیلیپین

2023-08-08

جوان‌آنلاين

Why's our monitor labelling this an incident or hazard?

The article centers on policy discussion and recommendations for responsible AI use, addressing potential risks and societal impacts without describing any realized harm or a specific event involving AI malfunction or misuse. Therefore, it does not qualify as an AI Incident or AI Hazard. It is best classified as Complementary Information because it provides context and governance-related insights about AI's societal implications and regulatory approaches in the Philippines.

Thumbnail Image

چگونه "شبیه‌سازی صدا با هوش مصنوعی" تبدیل به یک چالش جدی شده است؟

2023-08-09

قدس آنلاین | پایگاه خبری - تحلیلی

Why's our monitor labelling this an incident or hazard?

The article explicitly mentions the use of AI voice cloning (an AI system) to impersonate individuals and deceive victims into transferring money, which directly caused financial harm. The harm is realized and documented, including specific examples and widespread occurrences. The AI system's development and use are central to the incident, fulfilling the criteria for an AI Incident under the OECD framework.

Thumbnail Image

مسدود کردن ربات وب OpenAI امکانپذیر شد - تک ناک - اخبار دنیای تکنولوژی

2023-08-08

تک ناک - اخبار دنیای تکنولوژی

Why's our monitor labelling this an incident or hazard?

The article centers on OpenAI's introduction of a feature to block its GPTBot crawler and the broader discussion about data usage for AI training. There is no direct or indirect harm reported, nor a plausible immediate risk of harm from the described event. The content is primarily informational and contextual, relating to governance and user control over AI data collection. Therefore, it fits the definition of Complementary Information, as it enhances understanding of AI ecosystem developments and governance responses without describing a new AI Incident or AI Hazard.

Thumbnail Image

پای هوش مصنوعی به روزنامه‌نگاری هم باز می‌شود؟ - تیکنا

2023-08-08

tiecna.com

Why's our monitor labelling this an incident or hazard?

The article describes the deployment and implications of AI systems in journalism but does not document any realized harm (such as misinformation causing community harm, privacy violations with legal complaints, or job losses directly attributed to AI misuse) or a specific incident of malfunction or misuse. It also does not present a credible imminent risk of harm from AI use. Therefore, it does not meet the criteria for an AI Incident or AI Hazard. Instead, it offers contextual and analytical information about AI's role and impact in journalism, fitting the definition of Complementary Information.

Thumbnail Image

به مناسبت روز خبرنگارپای هوش مصنوعی به روزنامه‌نگاری هم باز می‌شود؟

2023-08-08

rahemardomonline.ir

Why's our monitor labelling this an incident or hazard?

The article describes the use and influence of AI systems in news organizations but does not report any concrete incident of harm or a specific hazard event. It focuses on the general transformation AI brings to journalism, ethical considerations, and future challenges without detailing any realized or imminent harm. Therefore, it fits the category of Complementary Information as it provides contextual and analytical information about AI's impact on journalism rather than reporting an AI Incident or AI Hazard.

Thumbnail Image

محققان چت‌بات‌های هوش مصنوعی را به ارائه آموزش ساخت بمب و مواد مخدر وادار کردند

2023-08-06

دیجیاتو

Why's our monitor labelling this an incident or hazard?

The event involves AI systems (chatbots like ChatGPT and Bard) whose use has directly led to the generation of harmful content (instructions for bomb and drug production). This constitutes a direct link between AI system use and potential harm to health and safety of individuals and communities. The researchers' demonstration of how to bypass safeguards shows the AI systems' misuse leading to harm, fulfilling the criteria for an AI Incident. The harm is realized in the sense that the AI systems are producing dangerous instructions, which can be used to cause injury or harm, even if the researchers withheld the exact outputs for ethical reasons.

Thumbnail Image

۷ نکته مهم برای استفاده از هوش مصنوعی در سیستم آموزشی - تک ناک - اخبار دنیای تکنولوژی

2023-08-06

تک ناک - اخبار دنیای تکنولوژی

Why's our monitor labelling this an incident or hazard?

The article does not describe any event where AI use has directly or indirectly caused harm or violation of rights. It also does not indicate any plausible future harm or risk stemming from AI use. Instead, it provides guidance and positive framing of AI tools in education, which fits the definition of Complementary Information as it supports understanding and responsible use of AI without reporting an incident or hazard.

Thumbnail Image

هشدار آخرالزمانی و ترسناک درباره ماشین کشتار هوش مصنوعی

2023-08-06

خبرگزاری برنا

Why's our monitor labelling this an incident or hazard?

The article discusses expert warnings about the potential dangers of AI, including the possibility of AI systems acting as 'machines of destruction' in the future and societal impacts like protests due to job losses. These are credible concerns about future risks but no actual harm or incident has occurred yet. Therefore, the event qualifies as an AI Hazard because it highlights plausible future harms from AI systems rather than reporting a realized AI Incident or a response to one. It is not merely general AI news or product updates, so it is not Unrelated, nor is it Complementary Information since it does not update or respond to a past incident.

Thumbnail Image

2023-08-21

Машки Магазин

Why's our monitor labelling this an incident or hazard?

The event explicitly involves AI systems (large language models/chatbots) and their use and malfunction (bypassing safety controls). The researchers' method directly undermines the protective measures designed to prevent harmful outputs, which could lead to real harm such as spreading hate speech, dangerous instructions, and other malicious content. This constitutes an AI Incident because the AI systems' malfunction or misuse has directly led to or enables harm to communities and violations of rights. The harm is not just potential but is demonstrated by the ability to coerce the AI into producing harmful content, which is a realized risk and impact.

Thumbnail Image

Што пишува во нацрт-законот за вештачка интелигенција на ЕУ?

2023-08-21

fakulteti.mk

Why's our monitor labelling this an incident or hazard?

The article discusses the proposed regulatory framework for AI in the EU, including risk classifications and compliance requirements. It does not report any actual or realized harm caused by AI systems, nor does it describe a specific event where AI use or malfunction led to harm. Instead, it provides complementary information about governance responses and future regulatory measures to manage AI risks. Therefore, it fits the definition of Complementary Information rather than an AI Incident or AI Hazard.

Thumbnail Image

Ќе се одржи глобален самит за вештачка интелигенција - Иновативност

2023-08-24

Иновативност

Why's our monitor labelling this an incident or hazard?

The article describes an upcoming AI summit where stakeholders will discuss potential risks and how to mitigate them. There is no mention of any actual AI system causing harm or malfunction, nor any realized incident. The focus is on addressing possible future risks and governance, which fits the definition of Complementary Information as it provides context and societal response to AI-related issues without reporting a specific AI Incident or Hazard.

Thumbnail Image

Хакери покажаа колку е лесно да се манипулира со вештачка интелигенција - USB.mk

2023-08-24

USB.mk

Why's our monitor labelling this an incident or hazard?

The event involves AI systems explicitly and their use in a hacking competition to expose vulnerabilities and biases. While the AI systems produced harmful or false outputs, these were generated in a controlled environment for testing purposes, with the goal of improving AI safety. No actual harm to persons, property, or rights is reported as having occurred. The event highlights plausible risks and vulnerabilities in AI systems that could lead to harm if exploited maliciously in the future. Therefore, it constitutes an AI Hazard, as it plausibly could lead to AI Incidents if such vulnerabilities are exploited outside controlled settings. The event also includes elements of complementary information about AI safety practices but primarily focuses on the plausible risks revealed by the hacking attempts.