Twitter AI Moderation Fails as Hate Speech Surges After Musk Takeover

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

Following Elon Musk's acquisition of Twitter, the platform saw a 500% surge in the use of racist slurs within 12 hours. This spike, tracked by the Network Contagion Research Institute, highlights a failure or inadequacy in Twitter's AI content moderation systems, enabling widespread dissemination of harmful hate speech.[AI generated]

Why's our monitor labelling this an incident or hazard?

The article explicitly links the rise in hate speech to the period after Elon Musk's takeover of Twitter, implying changes in content moderation AI systems or their enforcement. The AI systems that detect and moderate hate speech have either been disabled, altered, or failed to act, leading to a direct increase in harmful content. This constitutes harm to communities (harm category d) caused indirectly by the AI system's malfunction or change in use. Hence, it meets the criteria for an AI Incident rather than a hazard or complementary information.[AI generated]
AI principles
AccountabilityFairnessHuman wellbeingRespect of human rightsRobustness & digital securitySafetyTransparency & explainabilityDemocracy & human autonomy

Industries
Media, social platforms, and marketing

Affected stakeholders
ConsumersGeneral public

Harm types
PsychologicalHuman or fundamental rightsPublic interestReputational

Severity
AI incident

Business function:
Monitoring and quality control

AI system task:
Organisation/recommendersOther


Articles about this incident or hazard

Thumbnail Image

Prve posledice Maskove kupovine Tvitera: Rasistički ispadi porasli za 500 odsto

2022-10-30
NOVA portal
Why's our monitor labelling this an incident or hazard?
The article explicitly links the rise in hate speech to the period after Elon Musk's takeover of Twitter, implying changes in content moderation AI systems or their enforcement. The AI systems that detect and moderate hate speech have either been disabled, altered, or failed to act, leading to a direct increase in harmful content. This constitutes harm to communities (harm category d) caused indirectly by the AI system's malfunction or change in use. Hence, it meets the criteria for an AI Incident rather than a hazard or complementary information.
Thumbnail Image

Nakon Maskovog preuzimanja Tvitera: Upotreba rasističke reči skočila za 500 odsto | Novosadski informativni portal 021

2022-10-30
Radio 021
Why's our monitor labelling this an incident or hazard?
While Twitter employs AI systems for content moderation and recommendation, the article does not specify that AI systems directly caused or failed to prevent the increase in racist language. The harm is real (harm to communities through racist speech), but the AI's role is not clearly causal or pivotal. The event is more about social behavior changes following a corporate acquisition, with AI involvement inferred but not demonstrated as a direct factor in causing harm. Therefore, this is best classified as Complementary Information, providing context on social impacts related to AI-moderated platforms rather than an AI Incident or Hazard.
Thumbnail Image

NAKON MUSKOVOG PREUZIMANJA: Evo koliko je poraslo korištenje rasističke riječi na Twitteru?

2022-10-29
slobodna-bosna.ba
Why's our monitor labelling this an incident or hazard?
Twitter employs AI systems for content moderation and detection of harmful content. The reported surge in hate speech and offensive language after Musk's takeover suggests that the AI systems responsible for moderating content either malfunctioned, were disabled, or policies changed to allow more harmful content. This has directly led to harm to communities by enabling the spread of racist, antisemitic, and homophobic speech, which is a violation of human rights and harms communities. Hence, this qualifies as an AI Incident due to the direct link between AI system use/malfunction and realized harm.
Thumbnail Image

Korištenje rasističke riječi na Twitteru skočilo za 500% nakon Muskovog preuzimanja tvrtke! - Rijeka Danas

2022-10-29
Rijeka Danas - riječki internetski dnevnik
Why's our monitor labelling this an incident or hazard?
Twitter uses AI systems for content moderation and detection of harmful content. The surge in hate speech indicates a failure or inadequacy in these AI systems to prevent or mitigate the spread of harmful content. This failure has directly led to harm to communities by enabling the dissemination of racist and hateful speech, which can incite violence and social disruption. Therefore, this event qualifies as an AI Incident due to the direct link between AI system use/malfunction and realized harm to communities.
Thumbnail Image

Twitter's hate speech spiked following Elon Musk's takeover, according to new study

2022-11-02
USA Today
Why's our monitor labelling this an incident or hazard?
Twitter uses AI systems for content moderation and recommendation algorithms that determine what content users see. The spike in hate speech following policy changes under Musk suggests that these AI systems' use or configuration has indirectly led to harm to communities by enabling the spread of hate-driven content. The event describes realized harm (increased hate speech) linked to AI system use in content moderation and recommendation, qualifying it as an AI Incident under the framework.