Purdue Study: ChatGPT Delivers Incorrect Programming Answers 52% of the Time

Thumbnail Image

The information displayed in the AIM should not be reported as representing the official views of the OECD or of its member countries.

A Purdue University study found that ChatGPT’s programming answers on 517 StackOverflow questions were wrong 52% of the time, overly verbose 77%, and misinterpreted context 54%. Despite this, about one-third of developers still trusted its confident responses, risking flawed code and wasted development effort.[AI generated]

Why's our monitor labelling this an incident or hazard?

ChatGPT is an AI system providing programming answers. The study shows that over half of its answers contain inaccuracies, and users often overlook these errors, which can indirectly lead to harm in software development or learning. This constitutes an AI Incident because the AI system's use has directly or indirectly led to harm through misinformation and its consequences. The harm is non-physical but significant, affecting users' work and potentially broader software quality.[AI generated]
AI principles
Robustness & digital securityTransparency & explainabilitySafetyHuman wellbeingAccountabilityDemocracy & human autonomy

Industries
IT infrastructure and hosting

Affected stakeholders
Consumers

Harm types
Economic/Property

Severity
AI incident

Business function:
Research and development

AI system task:
Content generationInteraction support/chatbots


Articles about this incident or hazard