Evaluation and Comparison of Ophthalmic Scientific Abstracts and … – JAMA Network

Question  What is the quality of ophthalmic scientific abstracts and legitimacy of references generated by 2 versions of a popular artificial intelligence chatbot?
Findings  In this cross-sectional study, the quality of abstracts generated by the versions of the chatbot was comparable. The mean hallucination rate of citations was about 30% and was comparable between both versions.
Meaning  Both versions of the chatbot generated average-quality abstracts and hallucinated citations that appeared realistic; users should be wary of factual errors or hallucinations.
Importance  Language-learning model–based artificial intelligence (AI) chatbots are growing in popularity and have significant implications for both patient education and academia. Drawbacks of using AI chatbots in generating scientific abstracts and reference lists, including inaccurate content coming from hallucinations (ie, AI-generated output that deviates from its training data), have not been fully explored.
Objective  To evaluate and compare the quality of ophthalmic scientific abstracts and references generated by earlier and updated versions of a popular AI chatbot.
Design, Setting, and Participants  This cross-sectional comparative study used 2 versions of an AI chatbot to generate scientific abstracts and 10 references for clinical research questions across 7 ophthalmology subspecialties. The abstracts were graded by 2 authors using modified DISCERN criteria and performance evaluation scores.
Main Outcome and Measures  Scores for the chatbot-generated abstracts were compared using the t test. Abstracts were also evaluated by 2 AI output detectors. A hallucination rate for unverifiable references generated by the earlier and updated versions of the chatbot was calculated and compared.
Results  The mean modified AI-DISCERN scores for the chatbot-generated abstracts were 35.9 and 38.1 (maximum of 50) for the earlier and updated versions, respectively (P = .30). Using the 2 AI output detectors, the mean fake scores (with a score of 100% meaning generated by AI) for the earlier and updated chatbot-generated abstracts were 65.4% and 10.8%, respectively (P = .01), for one detector and were 69.5% and 42.7% (P = .17) for the second detector. The mean hallucination rates for nonverifiable references generated by the earlier and updated versions were 33% and 29% (P = .74).
Conclusions and Relevance  Both versions of the chatbot generated average-quality abstracts. There was a high hallucination rate of generating fake references, and caution should be used when using these AI resources for health education or academic purposes.
Hua H, Kaakour A, Rachitskaya A, Srivastava S, Sharma S, Mammo DA. Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots. JAMA Ophthalmol. Published online July 27, 2023. doi:10.1001/jamaophthalmol.2023.3119
© 2023
Artificial Intelligence Resource Center
Customize your JAMA Network experience by selecting one or more topics from the list below.
© 2023 American Medical Association. All Rights Reserved.
Terms of Use| Privacy Policy| Accessibility Statement

source

Jesse
https://playwithchatgtp.com