Performance of 4 Artificial Intelligence Chatbots in Answering Endodontic Questions.

IF 3.5 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Journal of endodontics Pub Date : 2025-01-13 DOI:10.1016/j.joen.2025.01.002

Saleem Abdulrab, Hisham Abada, Mohammed Mashyakhy, Nawras Mostafa, Hatem Alhadainy, Esam Halboub

{"title":"Performance of 4 Artificial Intelligence Chatbots in Answering Endodontic Questions.","authors":"Saleem Abdulrab, Hisham Abada, Mohammed Mashyakhy, Nawras Mostafa, Hatem Alhadainy, Esam Halboub","doi":"10.1016/j.joen.2025.01.002","DOIUrl":null,"url":null,"abstract":"Introduction: Artificial intelligence models have shown potential as educational tools in healthcare, such as answering exam questions. This study aimed to assess the performance of 4 prominent chatbots: ChatGPT-4o, MedGebra GPT-4o, Meta LIama 3, and Gemini Advanced in answering multiple-choice questions (MCQs) in endodontics.Methods: The study utilized 100 MCQs, each with 4 potential answers. These MCQs were obtained from 2 well-known endodontic textbooks. The performance of the above chatbots regarding choosing the correct answers was assessed twice with a 1-week interval.Results: The stability of the performance in the 2 rounds was highest for ChatGPT-4o, followed by Gemini Advanced and Meta Llama 3. MedGebra GPT-4o provided the highest percentage of true answers in the first round (93%) followed by ChatGPT-4o in the second round (90%). Meta Llama 3 provided the lowest percentages in the first (73%) and second rounds (75%). Although the performance of MedGebra GPT-4o was the best in the first round, it was less stable upon the second round (McNemar P > .05; Kappa = 0.725, P < .001).Conclusions: ChatGPT-4o and MedGebra GPT-4o answered a high fraction of endodontic MCQs, while Meta LIama 3 and Gemini Advanced showed lower performance. Further training and development are required to improve their accuracy and reliability in endodontics.","PeriodicalId":15703,"journal":{"name":"Journal of endodontics","volume":" ","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endodontics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.joen.2025.01.002","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Introduction: Artificial intelligence models have shown potential as educational tools in healthcare, such as answering exam questions. This study aimed to assess the performance of 4 prominent chatbots: ChatGPT-4o, MedGebra GPT-4o, Meta LIama 3, and Gemini Advanced in answering multiple-choice questions (MCQs) in endodontics.

Methods: The study utilized 100 MCQs, each with 4 potential answers. These MCQs were obtained from 2 well-known endodontic textbooks. The performance of the above chatbots regarding choosing the correct answers was assessed twice with a 1-week interval.

Results: The stability of the performance in the 2 rounds was highest for ChatGPT-4o, followed by Gemini Advanced and Meta Llama 3. MedGebra GPT-4o provided the highest percentage of true answers in the first round (93%) followed by ChatGPT-4o in the second round (90%). Meta Llama 3 provided the lowest percentages in the first (73%) and second rounds (75%). Although the performance of MedGebra GPT-4o was the best in the first round, it was less stable upon the second round (McNemar P > .05; Kappa = 0.725, P < .001).

Conclusions: ChatGPT-4o and MedGebra GPT-4o answered a high fraction of endodontic MCQs, while Meta LIama 3 and Gemini Advanced showed lower performance. Further training and development are required to improve their accuracy and reliability in endodontics.

查看原文本刊更多论文

四种AI聊天机器人在牙髓问题回答中的表现。

简介：人工智能模型已经显示出在医疗保健领域作为教育工具的潜力，比如回答考试问题。本研究旨在评估四个著名的聊天机器人：chatgpt - 40、MedGebra gpt40、Meta LIama 3和Gemini Advanced在回答牙髓学多选择题（mcq）方面的表现。方法：采用100个mcq，每个mcq有4个可能的答案。这些mcq来自两本著名的牙髓学教科书。这些聊天机器人在选择正确答案方面的表现每隔一周被评估两次。结果：chatgpt40的两轮性能稳定性最高，Gemini Advanced次之，Meta Llama3次之。medgebragpt40在第一轮中提供的真实答案百分比最高（93%），其次是chatgpt40在第二轮（90%）。Meta Llama3在第一轮（73%）和第二轮（75%）中提供的百分比最低。虽然medgebragpt40在第一轮的表现最好，但在第二轮时稳定性较差(McNemar P > 0.05；Kappa = 0.725, P < 0.001)。结论：chatgpt40和MedGebra gpt40对牙髓mcq的应答率较高，而Meta LIama 3和Gemini Advanced的应答率较低。需要进一步的培训和发展，以提高其在牙髓学中的准确性和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of endodontics 医学-牙科与口腔外科

CiteScore

8.80

自引率

9.50%

发文量

224

审稿时长

42 days

期刊介绍： The Journal of Endodontics, the official journal of the American Association of Endodontists, publishes scientific articles, case reports and comparison studies evaluating materials and methods of pulp conservation and endodontic treatment. Endodontists and general dentists can learn about new concepts in root canal treatment and the latest advances in techniques and instrumentation in the one journal that helps them keep pace with rapid changes in this field.