Performance of 4 Artificial Intelligence Chatbots in Answering Endodontic Questions.

IF 3.5 2区 医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE
Saleem Abdulrab, Hisham Abada, Mohammed Mashyakhy, Nawras Mostafa, Hatem Alhadainy, Esam Halboub
{"title":"Performance of 4 Artificial Intelligence Chatbots in Answering Endodontic Questions.","authors":"Saleem Abdulrab, Hisham Abada, Mohammed Mashyakhy, Nawras Mostafa, Hatem Alhadainy, Esam Halboub","doi":"10.1016/j.joen.2025.01.002","DOIUrl":null,"url":null,"abstract":"<p><strong>Introduction: </strong>Artificial intelligence models have shown potential as educational tools in healthcare, such as answering exam questions. This study aimed to assess the performance of 4 prominent chatbots: ChatGPT-4o, MedGebra GPT-4o, Meta LIama 3, and Gemini Advanced in answering multiple-choice questions (MCQs) in endodontics.</p><p><strong>Methods: </strong>The study utilized 100 MCQs, each with 4 potential answers. These MCQs were obtained from 2 well-known endodontic textbooks. The performance of the above chatbots regarding choosing the correct answers was assessed twice with a 1-week interval.</p><p><strong>Results: </strong>The stability of the performance in the 2 rounds was highest for ChatGPT-4o, followed by Gemini Advanced and Meta Llama 3. MedGebra GPT-4o provided the highest percentage of true answers in the first round (93%) followed by ChatGPT-4o in the second round (90%). Meta Llama 3 provided the lowest percentages in the first (73%) and second rounds (75%). Although the performance of MedGebra GPT-4o was the best in the first round, it was less stable upon the second round (McNemar P > .05; Kappa = 0.725, P < .001).</p><p><strong>Conclusions: </strong>ChatGPT-4o and MedGebra GPT-4o answered a high fraction of endodontic MCQs, while Meta LIama 3 and Gemini Advanced showed lower performance. Further training and development are required to improve their accuracy and reliability in endodontics.</p>","PeriodicalId":15703,"journal":{"name":"Journal of endodontics","volume":" ","pages":""},"PeriodicalIF":3.5000,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of endodontics","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.joen.2025.01.002","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Introduction: Artificial intelligence models have shown potential as educational tools in healthcare, such as answering exam questions. This study aimed to assess the performance of 4 prominent chatbots: ChatGPT-4o, MedGebra GPT-4o, Meta LIama 3, and Gemini Advanced in answering multiple-choice questions (MCQs) in endodontics.

Methods: The study utilized 100 MCQs, each with 4 potential answers. These MCQs were obtained from 2 well-known endodontic textbooks. The performance of the above chatbots regarding choosing the correct answers was assessed twice with a 1-week interval.

Results: The stability of the performance in the 2 rounds was highest for ChatGPT-4o, followed by Gemini Advanced and Meta Llama 3. MedGebra GPT-4o provided the highest percentage of true answers in the first round (93%) followed by ChatGPT-4o in the second round (90%). Meta Llama 3 provided the lowest percentages in the first (73%) and second rounds (75%). Although the performance of MedGebra GPT-4o was the best in the first round, it was less stable upon the second round (McNemar P > .05; Kappa = 0.725, P < .001).

Conclusions: ChatGPT-4o and MedGebra GPT-4o answered a high fraction of endodontic MCQs, while Meta LIama 3 and Gemini Advanced showed lower performance. Further training and development are required to improve their accuracy and reliability in endodontics.

四种AI聊天机器人在牙髓问题回答中的表现。
简介:人工智能模型已经显示出在医疗保健领域作为教育工具的潜力,比如回答考试问题。本研究旨在评估四个著名的聊天机器人:chatgpt - 40、MedGebra gpt40、Meta LIama 3和Gemini Advanced在回答牙髓学多选择题(mcq)方面的表现。方法:采用100个mcq,每个mcq有4个可能的答案。这些mcq来自两本著名的牙髓学教科书。这些聊天机器人在选择正确答案方面的表现每隔一周被评估两次。结果:chatgpt40的两轮性能稳定性最高,Gemini Advanced次之,Meta Llama3次之。medgebragpt40在第一轮中提供的真实答案百分比最高(93%),其次是chatgpt40在第二轮(90%)。Meta Llama3在第一轮(73%)和第二轮(75%)中提供的百分比最低。虽然medgebragpt40在第一轮的表现最好,但在第二轮时稳定性较差(McNemar P > 0.05;Kappa = 0.725, P < 0.001)。结论:chatgpt40和MedGebra gpt40对牙髓mcq的应答率较高,而Meta LIama 3和Gemini Advanced的应答率较低。需要进一步的培训和发展,以提高其在牙髓学中的准确性和可靠性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Journal of endodontics
Journal of endodontics 医学-牙科与口腔外科
CiteScore
8.80
自引率
9.50%
发文量
224
审稿时长
42 days
期刊介绍: The Journal of Endodontics, the official journal of the American Association of Endodontists, publishes scientific articles, case reports and comparison studies evaluating materials and methods of pulp conservation and endodontic treatment. Endodontists and general dentists can learn about new concepts in root canal treatment and the latest advances in techniques and instrumentation in the one journal that helps them keep pace with rapid changes in this field.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信