Evaluation of the accuracy of ChatGPT-4 and Gemini’s responses to the World Dental Federation’s frequently asked questions on oral health – BMC Oral Health
Advertisement
BMC Oral Health volume 25, Article number: 1293 (2025)
The field of artificial intelligence (AI) has experienced considerable growth in recent years, with the advent of technologies that are transforming a range of industries, including healthcare and dentistry. Large language models (LLMs) and natural language processing (NLP) are pivotal to this transformation. This study aimed to assess the efficacy of AI-supported chatbots in responding to questions frequently asked by patients to their doctors regarding oral health.
Frequently asked questions in the oral health section of the World Dental Federation FDI website were asked about Google-Gemini Trends and ChatGPT-4 chatbots on July 9, 2024. Responses from ChatGPT and Gemini, as well as those from the FDI webpage, were recorded. The accuracy of the responses given by ChatGPT-4 and Gemini to the four specified questions, the detection of similarities and differences, and the comprehensive examination of ChatGPT-4 and Gemini’s capabilities were analyzed and reported by the researchers. Furthermore, the content of the texts was evaluated in terms of their similarity with respect to the following criteria: “Main Idea,” “Quality Analysis,” “Common Ideas,” and “Inconsistent Ideas.”
It was observed that both ChatGPT-4 and Gemini exhibited performance comparable to that of the FDI responses in terms of completeness and clarity. Compared with Gemini, ChatGPT-4 provided responses that were more similar to the FDI responses in terms of relevance. Furthermore, ChatGPT-4 provided responses that were more accurate than those of Gemini in terms of the “Accuracy” criterion.
This study demonstrated that, according to the assessment conducted by FDI, the ChatGPT-4 and Gemini applications contain contemporary and comprehensible information in response to general inquiries concerning oral health. These applications are regarded as a prevalent and dependable source of information for individuals seeking to access such data.
Peer Review reports
The exponential growth of scientific and technological knowledge has resulted in a multitude of changes that have had profound impacts on our daily lives. One of the most significant contributing factors to this transformation is the advent of artificial intelligence (AI). The application of AI continues to evolve at a remarkable pace across a multitude of industries [1, 2]. However, in recent years, the field of AI has experienced remarkable growth, introducing technologies that are transforming various industries, including healthcare and dentistry. The advent of large language models (LLMs) and natural language processing (NLP) has precipitated a transformation in the field of artificial intelligence. These technologies enable machines to understand and generate human language with a level of complexity previously unimaginable [3]. LLMs, such as those powering ChatGPT-4 and Google Gemini, are trained on extensive datasets to enable them to comprehend context, generate coherent text, and perform complex language-related tasks. NLP is a crucial element of these models, enabling computers to process, interpret, and respond to the language of human beings. Together, LLMs and NLP are revolutionizing how we interact with technology in healthcare and dentistry [4, 5].
Diagnosing, treating, and rehabilitating diseases, as well as improving the community’s health level and managing health services, require various technological methods and tools [4, 5]. Since the 2000s, AI methods and tools have been actively applied in medicine and healthcare services. Among the most beneficial applications in healthcare are robotic applications for diagnostic and therapeutic procedures. AI-supported chatbots capable of natural language conversations offer accessible medical advice and information to patients through smartphones and computers. This accessibility enables individuals to manage their health more effectively and ensures that critical health information is always at their fingertips.
Artificial intelligence applications in dentistry are developing day by day [6]. AI technologies help diagnose dental conditions with high accuracy by analyzing radiographs and other medical images, leading to precise diagnoses and personalized treatment plans. AI can increase dentists’ diagnostic accuracy, mainly via increasing their sensitivity for detecting enamel lesions, but may also increase invasive therapy decisions [7]. These capabilities improve patient outcomes and streamline workflows in dental practices [8]. Oral hygiene refers to all care and cleaning practices performed to keep intraoral tissues healthy and protected from diseases. These care and cleaning practices encompass all teeth, gums, and the tongue. Oral hygiene is an important aspect of both individual and community health management. Oral hygiene prevents dental cavities, bad breath, and gum problems in individuals. The primary cause of periodontal disease is dental biofilms. The gold standard for removing dental biofilms is mechanical cleaning with auxiliary oral hygiene tools. Periodontal diseases are quite common in communities. Additionally, periodontal diseases interact with certain systemic conditions [9]. There are no studies in the literature regarding the success of AI-supported chatbots in answering questions about gum health for patients when they cannot reach their doctors. The aim of the present study is to address the questions in the oral health section defined by the World Dental Federation (FDI) to ChatGPT-4 (Chat Generative Pretrained Transformer version 4) and Gemini applications and to compare the content of the responses provided by the applications with those of the FDI.
Although ethical approval was not required to conduct this study, the research was carried out within local ethical frameworks and regulations as stated in the Declaration of Helsinki. All data generated or analysed during this study are included in its supplementary information files.
A new email account was created to avoid the negative bias effects of search algorithms. In order to access the Gemini and ChatGPT-4 applications, it was necessary to delete all search history and cookies via a Google search engine query. Access to the ChatGPT, Gemini, and FDI websites was obtained by means of a Google (Google Inc., California, USA) search engine. In July 2024, an academic researcher utilised the Google search engine to conduct a search for written texts pertaining to oral hygiene, employing the keyword “oral hygiene” as a search term.
Texts with fewer than 20 sentences, articles written for academic purposes, forum sites, websites created for health professionals, and commercial websites were excluded from the subsequent analysis. The study included websites that provided information and education on oral hygiene for patients. The texts were categorized according to their sources as private health institutions, university hospitals, periodontology specialists, dentists, oral and dental health centers, and professional organizations. The obtained data were transferred to a Microsoft Excel (Microsoft Corporation, Redmond, Washington, USA) file. During the search, the “Clear Browsing Data” option was used under the “Privacy and Security” section of the “Settings” menu.
Six questions pertaining to the domain of oral health were handpicked from the FDI website, with a focus on public information. All responses to the inquiries were obtained via AI applications. However, the number of inquiries reviewed was decreased from six to four, as two of the inquiries were of a descriptive nature and therefore not the type that members of the public would pose to search engines or AI-supported chat applications to obtain information about oral health.
The responses provided by ChatGPT-4 to users may change over time due to technological advancements, scientific discoveries, or other factors. Therefore, after a 15-day interval, the questions were asked again in their original English, and the responses were recorded. The current version of the application (GPT-4 and Gemini) was accessed via an internet connection from Minnesota, USA, on July 9, 2024, and July 24, 2024, at 12:00 PM. The questions were asked in the same order, and the responses were recorded [10, 11]. The responses to the questions at the FDI site were also recorded. The Q&A responses occurring over two separate days were compared and evaluated with the responses on the FDI website, as well as between the two LLMs. ChatGPT-4, Gemini, and FDI response texts were separately subjected to similarity checks via the ‘Document-to-Document Comparison’ feature in a similarity detection program (iThenticate®), and their similarities were reported.
Within the scope of the study, the first four of the six questions in the oral health section of the FDI website were selected and analyzed [12].
The four questions analyzed in the study are as follows:
How can you keep your mouth healthy throughout life?
Why does oral health matter?
What are the main risk factors for oral diseases?
How many people are affected by oral diseases?
In addition to determining the accuracy, similarities, and differences of the responses provided by ChatGPT-4 and Gemini to the four specified questions, a comprehensive analysis based on the following four criteria was conducted using a three-point Likert scale by the two researchers and reported: [13]
Main Idea: Analysis to determine the main idea or message of each response. Since the person asking the question primarily tries to understand the main idea of the given response, this was taken into account [14].
Quality Analysis: The accuracy and reliability of the information obtained by the questionnaire fundamentally depend on the quality of the information. Especially in science disciplines, correct and factual information can be obtained only through quality responses [15]. Each response was evaluated according to the criteria provided below.
Summary of Common Ideas: The common ideas between the responses prepared by experts and scientific authorities and those provided by AI provide insight into the validity and reliability of AI-based applications [16]. The goal is to identify overlapping ideas, concepts, or information in both responses.
Summary of Inconsistent Ideas: As with common ideas, inconsistent ideas also provide insights into the validity and reliability of AI-based applications [17]. Areas where the perspectives, information, or conclusions of the responses differ were investigated.
Two researchers, unaware of each other’s responses, evaluated the answers to the questions according to the evaluation criteria as “Yes”, “Neutral”, and “No”. These results were subsequently amalgamated into a single Excel file and recorded as researcher 1 and researcher 2. The agreement between the two researchers in their evaluation of the questions was statistically analysed on the basis of criteria and practice.
The IBM® SPSS Statistics 22 programme was utilised for the purpose of conducting the statistical analyses and for the evaluation of the findings obtained in the course of the study. The McNemar test was employed to facilitate a comparison of the qualitative data. The significance of these results was evaluated at the p > 0.05 level.
The responses to the questions posed to the ChatGPT-4 and Gemini programs after a 15-day interval were subjected to analysis via a word program with a document comparison feature. The results of the analysis demonstrated that the similarity percentages differed when the responses were evaluated within a 15-day interval. The similarity percentages of the responses received on July 9, 2024, were reported as 2% for Gemini and 3% for ChatGPT-4, whereas the similarity percentages of the responses received on July 24, 2024, were reported as 5% for Gemini and 2% for ChatGPT-4 (Tables 1 and 2).
There were no statistical differences between the evaluations of the two researchers for relevance level, correctness, integrity and clarity in ChatGPT-4 (p: 1.000; p > 0.05), (p:0.250; p > 0.05), (p:1.000; p > 0.05) and (p:1.000; p > 0.05), (Table 3). The evaluation of ChatGPT-4 responses revealed no statistically significant differences between the researchers (p: 0.189; p > 0.05) (Table 3).
There were no statistical differences between the evaluations of the two researchers for relevance level, correctness, integrity, and clarity in Gemini (p:0.276, p > 0.05), (p:0.083, p > 0.05), (p:0.317, p > 0.05), and (p:1.000, p > 0.05), (Table 4). The evaluation of Gemini responses revealed no statistically significant differences between the researchers (p: 0.189; p > 0.05), (Table 4).
There were no statistical differences between the evaluations of the two researchers for relevance level, correctness, integrity, clarity, and in total about the FDI website (p:1.000; p > 0.05), (Table 5).
The analysis revealed that both ChatGPT-4 and Gemini exhibited relatively consistent responses over time, with Gemini exhibiting slightly higher similarity scores compared to ChatGPT-4 (Fig. 1). It is observed that ChatGPT-4 and Gemini perform similarly to FDI in terms of the “completeness” and “clarity” criteria. ChatGPT-4 provides responses that are more similar to FDI responses than Gemini is in the “Relevance” criterion. In the “Accuracy” criterion, Gemini provides responses that are more similar to FDI responses than ChatGPT does.
This graphic is a radar chart comparing the responses of ChatGPT-4, Gemini, and FDI to four main criteria
AI-based chat applications are software applications that frequently offer a variety of services through text-based interfaces. These applications can be used for different purposes, such as providing information on specific topics, answering questions, delivering customer service, and even offering therapy by creating an artificial chat environment with people [18]. These applications are utilised in various domains within the healthcare system, as well as in dentistry. Researchers pose a range of questions to the AI and assess its integration into their respective areas of expertise. For instance, ChatGPT-4 was queried on the subject of the dentistry licence examination, and it was observed that it provided accurate responses, indicating its potential for application in patient management and dentistry education [19]. A recent study has shown that one of these applications, ChatGPT, can even be used to write scientific articles. It is progressing toward a digital workflow with significant changes. The emergence of new technologies has allowed patients to be better informed about their treatment plans [20].
The capacity of AI applications such as Chat-GPT to diagnose clinical conditions by generating diagnoses based on existing symptoms has not yet been fully established. While Hirosawa et al. reported that ChatGPT-3.5 was able to generate differential diagnoses with a high degree of diagnostic accuracy for a group of patients with general complaints [21], another study by Strong et al. emphasised that the accuracy of the responses to the case questions posed to ChatGPT varied in each iteration of the study and there were limitations in their consistency [22]. A literature review revealed that no studies have evaluated the quality of responses to questions about oral hygiene provided by AI applications such as ChatGPT-4 and Gemini. The aim of the current study is to evaluate the adequacy and usability of the responses given by the ChatGPT-4 and Gemini applications to frequently asked oral health questions by patients.
The utilization of AI-supported applications that are specifically designed for the management of oral health in patients can facilitate enhanced treatment adherence by enabling patients to access information without the need for direct interaction with a dental practitioner. This can lead to better clinical outcomes while reducing wasted time for both the patient and the dentist [23].
In a comparative study involving the ChatGPT-4 and BING AI chat applications in the domain of ophthalmological triage, ChatGPT-4 exhibited an elevated degree of diagnostic and triage accuracy, with minimal instances of erroneous responses in comparison to the BING application [24]. In this study, the ability of ChatGPT-4 to respond to common patient questions was evaluated by two researchers, and it was found to be more successful than the Gemini application.
A study investigating the potential benefits and limitations of utilizing the ChatGPT-4 in the field of oral and maxillofacial surgery reported that the ChatGPT-4 provides accurate and helpful responses to frequently asked questions posed by patients [24]. Among the applications evaluated in the present study, ChatGPT-4 was found to be more successful than Gemini in providing accurate and sufficient responses to frequently asked questions on the basis of FDI responses.
Muttanahally et al. [25] conducted a study evaluating the efficiency of four AI-supported voice virtual assistants (Google Assistant, Siri, Alexa, Cortana) in writing oral and maxillofacial radiology reports. They reported that Google Assistant was the most efficient, followed by Cortana, Siri, and Alexa. The study concluded that while virtual assistants were useful and practical for answering questions related to oral, dental, and maxillofacial radiology, more specialized topics and information were needed for writing reports [26]. Studies indicate that the use of AI in healthcare is promising. Nevertheless, it should be used responsibly alongside human health professionals, taking into account its limitations, and is expected to become more widespread in the field of dentistry in the future [24,25,26]. Another study reported that ChatGPT has potential applications in dental education and the creation of radiology reports, but it has limitations, such as the inability to respond to image-based questions and verify content [26]. A comparative study evaluating the responses of experts and ChatGPT to endodontic questions revealed that while ChatGPT is not yet sufficient for clinical decision-making, it could be used with further development [27].
In another study, the responses of ChatGPT to 30 questions regarding tooth-supported fixed prostheses and removable dental prostheses were evaluated by a panel of experts. The findings indicated that ChatGPT is not a suitable replacement for a dentist.
The training data of ChatGPT-4 may include questions and responses from the examined FDI website; however, this raises the concern that its responses might merely be rephrased versions of the website’s content. Importantly, ChatGPT lacks independent scientific reasoning; instead, it can only generate responses on the basis of recognized patterns and structures within the texts it has been trained on [28, 29]. However, the authors used a plagiarism detection program to verify the originality of ChatGPT-4’s responses compared with the information on the website, alleviating this concern. Considering the importance of oral and dental health, the high prevalence of oral and dental diseases, differing opinions on dental materials and methods, and technological advancements, dentists’ efforts to maintain oral and dental health are very important [30]. In this context, patients need to be accurately informed. Owing to the impact of the pandemic, many patients have preferred to research various subtopics on web-based platforms instead of visiting clinics where the risk of contamination is high [31]. However, the validity and reliability of information on web-based sites and applications are debatable. A number of studies have assessed the veracity of information on diverse online platforms and have reached the conclusion that the internet, which is akin to an infinite ocean, cannot supplant the patient‒doctor relationship in healthcare [32]. Nevertheless, the considerable increase in the utilization of the internet and AI in the developing world evinces a growing tendency among individuals to seek online resources for health-related matters, as is the case in numerous other domains [33]. Therefore, ensuring that those who require it have access to accurate and reliable information is highly important.
In Bloom’s taxonomy, a question-type classification is based on cognitive and hierarchical criteria, which underlie medical education, and a scale of question types is provided, ranging from lower to higher levels [34]. Lower-level cognitive skills include abilities such as recall and understanding, whereas higher-order thinking skills involve application, analysis, evaluation, and creation. LLM-based applications can respond sufficiently to lower-level skills but are not as successful in higher-level skills [35]. Therefore, the Bloom classification of questions asked for AI applications in studies is important and should be considered a limitation [36]. In the present study, the fact that all four questions obtained from the FDI website had lower-level skills may explain the accuracy of the responses given by ChatGPT-4 and Gemini regarding oral health. In future studies, it will be crucial to evaluate the responses by asking LLM-based applications more complex, higher-level questions. Indeed, the insufficiency of some responses in certain studies also proves this. In the context of dentistry, it is believed that studies on topics that frequently cause concern among patients and result in the dissemination of misinformation within society will contribute to the development of scientific literature and enhance the health and well-being of individuals and communities.
Despite extensive deployment in medical applications, no AI technology has been explicitly developed for dentistry. Some AI systems have demonstrated potential benefits in dentistry. However, the inconsistencies in their ability to provide accurate and sufficient information require further investigation by researchers. A specific AI-supported oral health module or an AI-supported application prepared specifically for the field of periodontology is needed. Implementing this technology could increase the efficiency of oral health practices, just as in their medical counterparts.
The present study evaluated the efficacy of two artificial intelligence programmes in informing the public about oral health in comparison with the FDI website. It was observed that although the answers on the FDI website were concise and adequate, the answers given by the AI programmes were more detailed and provided clues on options such as lifestyle and alternative medicine. However, as it was not possible to evaluate these criteria, all three sources were not evaluated in this respect.
Furthermore, an analysis of the plagiarism rates in the answers provided by the AI programmes revealed that the FDI website or another academic database was not among the sources used by these programmes. While the responses provided by AI programs are accurate and adequate, their reliance on non-scientific databases as a source of information is a cause for concern.
In the future, similar studies should be conducted with larger samples, multiple users, and a broader range of questions. AI-supported applications are promising in medicine and dentistry for answering patients’ treatment-related questions accurately and sufficiently. However, the number of topics and information must be increased for these applications to be used in the field of oral health.
The findings of this study indicate that ChatGPT-4 and Gemini, which are seen as potential information providers and are increasingly used in today’s technology-driven world, provide responses about oral health that are consistent in content with FDI’s responses to the same questions. Despite the fact that the responses provided by ChatGPT and Gemini are consistent with the content of the FDI website, it is evident that these LLMs generate content in an original manner with minimal plagiarism. It is clear that AI models hold significant promise as accessible sources of health information. ChatGPT-4 and Gemini demonstrated a high degree of clarity and user-friendliness in their responses. For individuals seeking quick, on-demand advice on oral hygiene practices, these systems can provide valuable information on routine topics. Despite these advantages, ethical concerns surrounding the use of AI in healthcare continue to persist. As AI systems such as ChatGPT-4 and Gemini have evolved, ensuring that these models adhere to established medical guidelines and ethical standards will be crucial.
Data is provided within the supplementary information files.
Abdullah R, Fakieh B. Health care employees’ perceptions of the use of artificial intelligence applications: survey study. J Med Internet Res. 2020;22:e17620. https://doi.org/10.2196/17620
Article Google Scholar
Geis JR, Brady AP, Wu CC, Spencer J, Ranschaert E, Jaremko JL, Langer SG, Borondy Kitts A, Birch J, Shields WF, van den Hoven R, Kotter E, Wawira Gichoya J, Cook TS, Morgan MB, Tang A, Safdar NM, Kohli M. Ethics of artificial intelligence in radiology: summary of the joint European and North American multisociety statement. Radiology. 2019;293:436–40. https://doi.org/10.1148/radiol.2019191586
Article Google Scholar
Bhardwaz S, Kumar J. An extensive comparative analysis of chatbot technologies-ChatGPT, Google BARD and Microsoft Bing. In 2nd International Conference on Applied Artificial Intelligence and Computing. India, 2023;673–679. https://doi.org/10.1109/ICAAIC56838.2023.10140214
Brown TB, Mann B, Ryder N, Subbiah M, Kaplan J, Dhariwal P, Neelakantan A, Shyam P, Sastry G, Askell A, Agarwal S, Herbert-Voss A, Krueger G, Henighan T, Child R, Ramesh A, Ziegler DM, Wu J, Winter C, Hesse C, Chen M, Sigler E, Litwin M, Gray S, Chess B, Clark J, Berner C, McCandlish S, Radford A, Sutskeve I, Amodei D. Language models are few-shot learners. Adv Neural Inf Process. 2020;33:1877–901. https://doi.org/10.48550/arXiv.2005.14165
Article Google Scholar
Veranyurt U, Deveci AF, Esen MF, Veranyurt O. (2020). Disease classification by machine learning techniques: random forest, K-nearest neighbor and adaboost algorithms applications. Usaysad Derg. 2020;6(2):275–286.
Chen Yo-wei, Stanley K, Att W. Artificial intelligence in dentistry: current applications and future perspectives. Quintessence Int. 2020;51(3):248–57.
Google Scholar
Mertens S, Krois J, Cantu AG, Arsiwala LT, Schwendicke F. Artificial intelligence for caries detection: randomized trial. J Dent. 2021;115:103849. https://doi.org/10.1016/j.jdent.2021.103849. Epub 2021 Oct 14. PMID: 34656656.
Article Google Scholar
Umer F. Could AI offer practical solutions for dentistry in the future? BDJ Team. 2022;9:26–8. https://doi.org/10.1038/s41407-022-0830-1
Article Google Scholar
Herrera D, Sanz M, Kebschull M, Jepsen S, Sculean A, Berglundh T, Papapanou PN, Chapple I, Tonetti MS. EFP workshop participants and methodological consultant. Treatment of stage IV periodontitis: The EFP S3 level clinical practice guideline. J Clin Periodontol. 2022;49 Suppl 24:4–71. https://doi.org/10.1111/jcpe.13639. PMID: 35688447.
https://openai.com/gpt-4
https://gemini.google.com/app?utm_source=google%26utm_medium=cpc%26utm_campaign=2024enUS_gemfeb%26gad_source=1%26gclid=Cj0KCQjwv7O0BhDwARIsAC0sjWPrr2QyvZCN8-ejRdF_o0jI_UaHQKnC8ZnWgaGm2eeqiUDzluTOnZgaAmX_EALw_wcB
FDA reminder For The. National Dental Health Month. https://www.fda.gov.ph/fdaadvisory-no-2024-0409-fda-reminders-for-the-national-dental-health-month-in-february/
Buldur M, Sezer B. Evaluating the accuracy of chat generative pre-trained transformer version 4 (ChatGPT-4) responses to United States food and drug administration (FDA) frequently asked questions about dental amalgam. BMC Oral Health. 2024;24:605. https://doi.org/10.1186/s12903-024-0435
Article Google Scholar
Kochanek M, Cichecki I, Kaszyca O, Szydło D, Madej M, Jędrzejewski D, Kazienko P, Kocoń J. Improving training dataset balance with ChatGPT prompt engineering. Electronics. 2024;13:2255. https://doi.org/10.3390/electronics13122255
Article Google Scholar
Oh S, Yi YJ, Worrall A. Quality evaluation of health answers in social Q&A: socio-emotional support and evaluation criteria. Proc Am Soc Info Sci Tech. 2012;49:1–6. https://doi.org/10.1002/meet.14504901075
Article Google Scholar
Hulman A, Dollerup OL, Mortensen JF, Fenech ME, Norman K, Støvring H, Hansen TK. ChatGPT- versus human-generated answers to frequently asked questions about diabetes: a turing test-inspired survey among employees of a Danish diabetes center PLoS ONE. 2023;18:e0290773. https://doi.org/10.1371/journal.pone.0290773
Gregorcic B, Pendrill AM. ChatGPT and the frustrated Socrates phys Educ. 2023;58:035021. https://doi.org/10.1088/1361-6552/acc299
Walker H, Louise et al. (2023) Reliability of medical information provided by ChatGPT: assessment against clinical guidelines and patient information quality instrument. Journal of Medical Internet Research 25:e47479. Deiana G, Dettori M, Arghittu A, Azara A, Gabutti G, Castiglia P. Artificial intelligence and public health: evaluating ChatGPT responses to vaccination myths and misconceptions. Vaccines. 2023;11:1217. https://doi.org/10.3390/vaccines11071217
Chau RCW, Thu KM, Yu OY, Hsung RT, Lo ECM, Lam WYH. Performance of generative artificial intelligence in dental licensing examinations. Int Dent J. 2024;74(3):616–21. https://doi.org/10.1016/j.identj.2023.12.007. Epub 2024 Jan 19. PMID: 38242810; PMCID: PMC11123518.
Article Google Scholar
Tustumi F, Andreollo N, Aguilar-Nascimento J. Future of the language models in healthcare: the role of ChatGPT. Arquivos Brasileiros De Cirurgia Digestiva. 2023;356:e1721. https://doi.org/10.1590/0102-672020230002e1727
Article Google Scholar
Hirosawa T, Harada Y, Yokose M, Sakamoto T, Kawamura R. Shimizu diagnostic accuracy of differential-diagnosis lists generated by generative pretrained transformer 3 chatbot for clinical vignettes with common chief complaints: a pilot study int. J Environ Res Public Health. 2023;20:3378. https://doi.org/10.3390/ijerph20043378
Article Google Scholar
Strong E, DiGiammarino A, Weng Y, Basaviah P, Hosamani P, Kumar A, Nevins A, Kugler J, Hom J, Chen J.H. Performance of ChatGPT on free-response, clinical reasoning exams. MedRxiv. 2023. https://doi.org/10.1101/2023.03.24.23287731
Article Google Scholar
Yang R, Tan TF, Lu W, Thirunavukarasu AJ, Ting DSW, Liu N. Large language models in health care: development, applications, and challenges. Health Sci J. 2023;2:255–63. https://doi.org/10.1002/hcs2.61
Article Google Scholar
Lee SJ, Lee C-J, Hwang H. The impact of COVID-19 misinformation and trust in institutions on preventive behaviors. Health Education Research. 2023;38(1):95–105. https://doi.org/10.1093/her/cyac038
World Health Organization. Oral health. World Health Organization. Regional Office for Africa; 2021. https://www.afro.who.int/health-topics/oral-health. [2021 Aug 8].
Marcenes W, Kassebaum NJ, Bernabé E, et al. Global burden of oral conditions in 1990–2010: a systematic analysis. J Dent Res. 2013;92(7):592–8.
Google Scholar
Kisely S, Sawyer E, Siskind D, et al. The oral health of people with anxiety and depressive disorders – a systematic review and meta-analysis. J Affect Disord. 2016;200:119–32.
Google Scholar
Jin LJ, Lamster IB, Greenspan JS, et al. Global burden of oral diseases: emerging concepts, management and interplay with systemic health. Oral Dis. 2016;22(7):609–19.
Google Scholar
Bagde H, Dhopte A, Alam MK, Basri R. A systematic review and meta-analysis on ChatGPT and its utilization in medical and dental research. Heliyon. 2023;9:e23050. https://doi.org/10.1016/j.heliyon.2023.e23050
Article Google Scholar
Suárez A, Díaz-Flores García V, Algar J, Gómez Sánchez M, Llorente de Pedro M, Freire Y. Unveiling the ChatGPT phenomenon: evaluating the consistency and accuracy of endodontic question answers. Int Endod J. 2024;57:108–13. https://doi.org/10.1111/iej.13985
Article Google Scholar
Freire Y, Santamaría Laorden A, Orejas Pérez J, Gómez Sánchez M, Díaz-Flores García V, Suárez A. ChatGPT performance in prosthodontics: assessment of accuracy and repeatability in answer generation. J Prosthet Dent. 2024;131. https://doi.org/10.1016/j.prosdent.2024.01.018.:659.e1-659.e6
Vaira LA, Lechien JR, Abbate V, Allevi F, Audino G, Beltramini GA, Bergonzani M, Bolzoni A, Committeri U, Crimi S, Gabriele G, Lonardi F, Maglitto F, Petrocelli M, Pucci R, Saponaro G, Tel A, Vellone V, Chiesa-Estomba CM, Boscolo-Rizzo P, Salzano G, De Riu G. Accuracy of ChatGPT-generated information on head and neck and oromaxillofacial surgery: a multicenter collaborative analysis. Otolaryngol Head Neck Surg. 2023 Aug 18. Epub ahead of print. https://doi.org/10.1002/ohn.489
Kılınç DD, Mansız D. Examination of the reliability and readability of Chatbot Generative Pretrained Transformer’s (ChatGPT) responses to questions about orthodontics and the evolution of these responses in an updated version. Am J Orthod Dentofacial Orthop. 2024:S0889-5406(24)00007 – 6. Epub ahead of print. https://doi.org/10.1016/j.ajodo.2023.11.012
Mago J, Sharma M. The potential usefulness of ChatGPT in oral and maxillofacial radiology. Cureus. 2023;15:e42133. https://doi.org/10.7759/cureus.42133
Article Google Scholar
Evans T. (2021). AI and the future of patient care. BMJ Health & Care Informatics. Available at: https://doi.org/10.1136/bmjhci-2021-100240
Garcia F. (2022). Ethical concerns in AI healthcare applications. International Journal of Medical Ethics. Available at: https://doi.org/10.1136/jme.2021-011324
Download references
Not applicable.
The author(s) received no financial support for the research.
Faculty of Dentistry, Department of Periodontology, Istanbul Aydin University, Istanbul, Turkey
Aysenur Arpaci
Faculty of Dentistry, Department of Maxillofacial Radiology, Istanbul Atlas University, Istanbul, Turkey
Asel Usdat Ozturk
Faculty of Engineering, Department of Computer Science, Boston University, Boston, USA
Ismail Okur
Faculty of Dentistry, Department of Orthodontics, Istanbul Atlas University, Istanbul, Turkey
Sanaz Sadry
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
PubMed Google Scholar
Conception: I.O and A.A; Design of the work: A.A, A.O and SS; Analysis and interpretation of data: A.A, I.O, A.O and S.S; Figue: I.OTable: A.A and A.ODrafted the work or substantively revised it: A.A, I.O, A.O and S.S; Approved the submitted: A.A, I.O, A.O and S.S.
Correspondence to Asel Usdat Ozturk.
Not applicable.
Not applicable.
The authors declare no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License, which permits any non-commercial use, sharing, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if you modified the licensed material. You do not have permission under this licence to share adapted material derived from this article or parts of it. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-nd/4.0/.
Reprints and permissions
Arpaci, A., Ozturk, A.U., Okur, I. et al. Evaluation of the accuracy of ChatGPT-4 and Gemini’s responses to the World Dental Federation’s frequently asked questions on oral health. BMC Oral Health 25, 1293 (2025). https://doi.org/10.1186/s12903-025-06624-9
Download citation
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s12903-025-06624-9
Anyone you share the following link with will be able to read this content:
Sorry, a shareable link is not currently available for this article.
Provided by the Springer Nature SharedIt content-sharing initiative
Advertisement
ISSN: 1472-6831
By using this website, you agree to our Terms and Conditions, Your US state privacy rights, Privacy statement and Cookies policy. Your privacy choices/Manage cookies we use in the preference centre.
© 2025 BioMed Central Ltd unless otherwise stated. Part of Springer Nature.