{"title":"现代人工智能聊天机器人回答牙髓问题的准确性评估。","authors":"Melis Çakar, Ayşe Tuğba Eminsoy Avcı, Salih Düzgün, Tuğrul Aslan, Kübra Nur Hekimoğlu","doi":"10.1111/aej.70012","DOIUrl":null,"url":null,"abstract":"<p><p>This study aims to compare the accuracy of modern AI chatbots, including Gemini 1.5 Flash, Gemini 1.5 Pro, ChatGPT-3.5 and ChatGPT-4, in responding to endodontic questions and supporting clinicians. Forty yes/no questions covering 12 endodontic topics were formulated by three experts. Each question was presented to the AI models on the same day, with a new chat session initiated for each. The agreement between chatbot responses and expert consensus was assessed using Cohen's kappa test (p < 0.05). ChatGPT-3.5 demonstrated the highest accuracy (80%), followed by ChatGPT-4 (77.5%), Gemini 1.5 Pro (72.5%) and Gemini 1.5 Flash (60%). The agreement levels ranged from weak (ChatGPT models) to minimal (Gemini Flash). The findings indicate variability in chatbot performance, with ChatGPT models outperforming Gemini. However, reliance on AI-generated responses for clinical decision-making remains questionable. Future studies should incorporate more complex clinical scenarios and broader analytical approaches to enhance the assessment of AI chatbots in endodontics.</p>","PeriodicalId":55581,"journal":{"name":"Australian Endodontic Journal","volume":" ","pages":""},"PeriodicalIF":1.5000,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Assessment of the Accuracy of Modern Artificial Intelligence Chatbots in Responding to Endodontic Queries.\",\"authors\":\"Melis Çakar, Ayşe Tuğba Eminsoy Avcı, Salih Düzgün, Tuğrul Aslan, Kübra Nur Hekimoğlu\",\"doi\":\"10.1111/aej.70012\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This study aims to compare the accuracy of modern AI chatbots, including Gemini 1.5 Flash, Gemini 1.5 Pro, ChatGPT-3.5 and ChatGPT-4, in responding to endodontic questions and supporting clinicians. Forty yes/no questions covering 12 endodontic topics were formulated by three experts. Each question was presented to the AI models on the same day, with a new chat session initiated for each. The agreement between chatbot responses and expert consensus was assessed using Cohen's kappa test (p < 0.05). ChatGPT-3.5 demonstrated the highest accuracy (80%), followed by ChatGPT-4 (77.5%), Gemini 1.5 Pro (72.5%) and Gemini 1.5 Flash (60%). The agreement levels ranged from weak (ChatGPT models) to minimal (Gemini Flash). The findings indicate variability in chatbot performance, with ChatGPT models outperforming Gemini. However, reliance on AI-generated responses for clinical decision-making remains questionable. Future studies should incorporate more complex clinical scenarios and broader analytical approaches to enhance the assessment of AI chatbots in endodontics.</p>\",\"PeriodicalId\":55581,\"journal\":{\"name\":\"Australian Endodontic Journal\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":1.5000,\"publicationDate\":\"2025-08-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Australian Endodontic Journal\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1111/aej.70012\",\"RegionNum\":4,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"DENTISTRY, ORAL SURGERY & MEDICINE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Australian Endodontic Journal","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/aej.70012","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
Assessment of the Accuracy of Modern Artificial Intelligence Chatbots in Responding to Endodontic Queries.
This study aims to compare the accuracy of modern AI chatbots, including Gemini 1.5 Flash, Gemini 1.5 Pro, ChatGPT-3.5 and ChatGPT-4, in responding to endodontic questions and supporting clinicians. Forty yes/no questions covering 12 endodontic topics were formulated by three experts. Each question was presented to the AI models on the same day, with a new chat session initiated for each. The agreement between chatbot responses and expert consensus was assessed using Cohen's kappa test (p < 0.05). ChatGPT-3.5 demonstrated the highest accuracy (80%), followed by ChatGPT-4 (77.5%), Gemini 1.5 Pro (72.5%) and Gemini 1.5 Flash (60%). The agreement levels ranged from weak (ChatGPT models) to minimal (Gemini Flash). The findings indicate variability in chatbot performance, with ChatGPT models outperforming Gemini. However, reliance on AI-generated responses for clinical decision-making remains questionable. Future studies should incorporate more complex clinical scenarios and broader analytical approaches to enhance the assessment of AI chatbots in endodontics.
期刊介绍:
The Australian Endodontic Journal provides a forum for communication in the different fields that encompass endodontics for all specialists and dentists with an interest in the morphology, physiology, and pathology of the human tooth, in particular the dental pulp, root and peri-radicular tissues.
The Journal features regular clinical updates, research reports and case reports from authors worldwide, and also publishes meeting abstracts, society news and historical endodontic glimpses.
The Australian Endodontic Journal is a publication for dentists in general and specialist practice devoted solely to endodontics. It aims to promote communication in the different fields that encompass endodontics for those dentists who have a special interest in endodontics.