ChatGPT often wrong about coding but sounds authoritative: Research – Business Insider

Jump to

ChatGPT seems to have had a lot of success in convincing people that it’s smart. But what if it was actually duping them into thinking that? 
The chatbot, built by OpenAI, has transformed society since its release in November, frequently cropping up in earnings calls with CEOs, and disrupting everything from education to the creative industries.
But a pre-print paper released this month suggests ChatGPT has a neat little trick to convince people it’s smart: A kind of style over substance approach. 
Researchers from Purdue University analyzed ChatGPT’s replies to 517 questions posted to Stack Overflow, an essential Q&A site for software developers and engineers. 
After assessing the bot’s responses for “correctness, consistency, comprehensiveness, and conciseness,” the researchers found that 52% of the answers were flat-out incorrect, and 77% committed the writing sin of being verbose. 
An additional segment of the study found ChatGPT users prefer its responses to questions versus the human responses on Stack Overflow a startling 40% of the time — despite all the errors it throws up.
“When asked why they preferred ChatGPT answers even when they were incorrect, participants suggested the comprehensiveness and articulated language structures of the answers to be some reason for their preference,” the research noted.
A caveat: This user analysis involved just 12 programmers being asked to assess if they prefer the responses of ChatGPT or those written by humans on Stack Overflow to 2,000 randomly sampled questions. But OpenAI itself has warned that the bot can write “plausible-sounding but incorrect or nonsensical answers.”
OpenAI didn’t respond to Insider’s request for comment on the research findings outside regular working hours.
As Insider’s Alistair Barr and Adam Rogers reported this month, Stack Overflow has become a case study in what Elon Musk has called “death by LLM,” with traffic to its site down 13% year-on-year in April, one month after OpenAI released its premium GPT-4 AI model.
The Purdue findings follow research from Stanford and UC Berkeley academics indicating that the large language model is getting dumber.
The speed with which ChatGPT seems to have embedded itself into the internet without much scrutiny has provoked alarm and irritation among AI ethicists and programmers.
In response to the Purdue research, computer scientist and AI expert Timnit Gebru tweeted: “Great that Stack Overflow is being destroyed by OpenAI +friends.”
Read next