To use ChatGPT or not? Vanderbilt University Medical Center releases study on AI accuracy – Nashville Business Journal
Listen to this article 5 min
Physicians at Vanderbilt University Medical Center are asking the question: to use ChatGPT or not use ChatGPT?
In short, the answer is not yet.
In a study conducted by the medical center using 33 of the hospital’s physicians across 17 medical specialties, they found that while Open AI’s ChatGPT is promising for providing medical professionals with answers to both simple and complex questions, the program still needs to be developed further.
The purpose the of the study was to determine how accurate and comprehensive the publicly available ChatGPT would be in generating responses to 284 medical questions while also highlighting the reliability and limitations of the program.
The use of artificial intelligence has boomed across the health care and business world over the past year. In August, HCA partnered with Google Cloud to launch a pilot program using AI to improve nurse and physician workflow. And earlier this year, Open AI launched a $175 million fund to invest in promising generative AI startups.
Dr. Douglas Johnson, associate professor of medicine at Vanderbilt University Medical Center, ran the Vanderbilt study, which cost nothing but time for the physicians involved.
“I immediately realized that [ChatGPT] was something that patients were going to use and physicians were going to use … to get their information for health care,” Johnson said. “We wanted to understand how accurate the information, how complete the information was that was coming out.”
Vanderbilt University Medical Center is Middle Tennessee’s largest hospital, according to Nashville Business Journal research, with 1,174 beds and more than $3.5 billion of revenue in 2020.
Gross patient revenue
Previous studies using ChatGPT focused on closed-ended and multiple-choice questions while this study focused on questions that required a yes or no answer in addition to those needing a descriptive answer.
For example, one of the questions asked was “What oral antibiotics may be used for the treatment of MRSA infections?”
The physicians then graded the responses received on accuracy and completeness. For example, the answer to the above question — which included seven antibiotics and a disclaimer that treatment should be given in consultation with an infectious disease specialist — received a score of three out of six for accuracy and one out of three for completeness. The reason for the score was because some of the options given were not available in oral form and the program also left out one of the most important oral antibiotics used as a treatment.
The physicians did this for two different versions of ChatGPT. Once in April and then again in May when GPT-3.5 was released.
“Things have obviously changed quite a lot since we did this research with GPT-4 coming out,” Johnson said.
Overall, there was a “fairly high accuracy and high completeness” in the answers to the 284 questions, which Johnson said he was surprised by. In the second round of asking the software questions — once ChatGPT-3.5 was released — the answers had substantially higher ratings which means the program showed rapid improvement.
But there were many instances where ChatGPT was “spectacularly and surprisingly wrong,” according to the study. ChatGPT is not able to grade the reliability of sources to choose established guidelines over a social media blog discussing the same medical concept. This is in part why the study concluded the program should not be used at this time as the sole resource for medical knowledge and further research and development is needed.
“The biggest [risk] is providing wrong information that leads to safety issues and mistakes,” Johnson said.
Since the conclusion of the study, Johnson said they have tested some of the questions that didn’t perform as well again in the updated ChatGPT-4.
“There was clearly an improvement in accuracy with more updated versions,” Johnson said. “It’s able to improve as it gets more information and so I think that the results will only improve over time.”
Johnson said he and his colleagues have interest in testing and using ChatGPT in a number of different areas to see how reliable the software is for different fields and topics.
“We do have a number of other projects ongoing,” he said.
But for now, he said his policy is to use ChatGPT as a creative tool to help get ideas flowing but to not rely on it.
© 2023 American City Business Journals. All rights reserved. Use of and/or registration on any portion of this site constitutes acceptance of our User Agreement (updated January 24, 2023) and Privacy Policy (updated June 27, 2023). The material on this site may not be reproduced, distributed, transmitted, cached or otherwise used, except with the prior written permission of American CityBusiness Journals.