Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?

IF 2.3 3区 医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE
Hasibe Elif Kuru, Aslı Aşık, Doğukan Mert Demir
{"title":"Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?","authors":"Hasibe Elif Kuru, Aslı Aşık, Doğukan Mert Demir","doi":"10.1111/edt.13063","DOIUrl":null,"url":null,"abstract":"<p><strong>Background/aim: </strong>Artificial intelligence (AI) chatbots, also known as large language models (LLMs), have become increasingly common educational tools in healthcare. Although the use of LLMs for emergency dental trauma is gaining popularity, it is crucial to assess their reliability. This study aimed to compare the reliabilities of different LLMs in response to multiple questions related to dental trauma.</p><p><strong>Materials and methods: </strong>In a cross-sectional observational study conducted in October 2024, 30 questions (10 multiple-choice, 10 fill-in-the-blank, and 10 dichotomous) based on the International Association of Dental Traumatology guidelines were posed to five LLMs: ChatGPT 4, ChatGPT 3.5, Copilot Free version (Copilot F), Copilot Pro (Copilot P), and Google Gemini over nine consecutive days. Responses of each model (1350 in total) were recorded in binary format and analyzed using Pearson's chi-square and Fisher's exact tests to assess correctness and consistency (p < 0.05).</p><p><strong>Results: </strong>The answers provided by the LLMs to repeated questions on consecutive days showed a high degree of repeatability. Although there was no statistically significant difference in the success rate of providing correct answers among the LLMs (p > 0.05), the rankings based on the rate of successful answers were as follows: ChatGPT 3.5 (76.7%) > Copilot P (73.3%) > Copilot F (70%) > ChatGPT 4 (63.3%) > Gemini (46.7%). ChatGPT 3.5, ChatGPT 4, and Gemini showed a significantly higher correct response rate for multiple choice and fill in the blank questions compared to their performance on dichotomous (true/false) questions (p < 0.05). Conversely, The Copilot models did not exhibit significant differences across question types. Notably, the explanations provided by Copilot and Gemini were often inaccurate, and Copilot's cited references had low evidential value.</p><p><strong>Conclusions: </strong>While LLMs show potential as adjunct educational tools in dental traumatology, their variable accuracy and inclusion of unreliable references call for careful integration strategies.</p>","PeriodicalId":55180,"journal":{"name":"Dental Traumatology","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dental Traumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/edt.13063","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}
引用次数: 0

Abstract

Background/aim: Artificial intelligence (AI) chatbots, also known as large language models (LLMs), have become increasingly common educational tools in healthcare. Although the use of LLMs for emergency dental trauma is gaining popularity, it is crucial to assess their reliability. This study aimed to compare the reliabilities of different LLMs in response to multiple questions related to dental trauma.

Materials and methods: In a cross-sectional observational study conducted in October 2024, 30 questions (10 multiple-choice, 10 fill-in-the-blank, and 10 dichotomous) based on the International Association of Dental Traumatology guidelines were posed to five LLMs: ChatGPT 4, ChatGPT 3.5, Copilot Free version (Copilot F), Copilot Pro (Copilot P), and Google Gemini over nine consecutive days. Responses of each model (1350 in total) were recorded in binary format and analyzed using Pearson's chi-square and Fisher's exact tests to assess correctness and consistency (p < 0.05).

Results: The answers provided by the LLMs to repeated questions on consecutive days showed a high degree of repeatability. Although there was no statistically significant difference in the success rate of providing correct answers among the LLMs (p > 0.05), the rankings based on the rate of successful answers were as follows: ChatGPT 3.5 (76.7%) > Copilot P (73.3%) > Copilot F (70%) > ChatGPT 4 (63.3%) > Gemini (46.7%). ChatGPT 3.5, ChatGPT 4, and Gemini showed a significantly higher correct response rate for multiple choice and fill in the blank questions compared to their performance on dichotomous (true/false) questions (p < 0.05). Conversely, The Copilot models did not exhibit significant differences across question types. Notably, the explanations provided by Copilot and Gemini were often inaccurate, and Copilot's cited references had low evidential value.

Conclusions: While LLMs show potential as adjunct educational tools in dental traumatology, their variable accuracy and inclusion of unreliable references call for careful integration strategies.

人工智能语言模型能有效解决牙外伤问题吗?
背景/目的:人工智能(AI)聊天机器人(也称为大型语言模型(LLM))已成为医疗保健领域越来越常见的教育工具。虽然在牙科创伤急救中使用 LLMs 的做法越来越受欢迎,但评估其可靠性至关重要。本研究旨在比较不同大型语言模型在回答与牙科创伤相关的多个问题时的可靠性:在 2024 年 10 月进行的一项横断面观察研究中,根据国际牙科创伤学会指南向五种 LLM 提出了 30 个问题(10 个选择题、10 个填空题和 10 个二分题):ChatGPT 4、ChatGPT 3.5、Copilot 免费版(Copilot F)、Copilot 专业版(Copilot P)和 Google Gemini 连续九天进行了测试。每个模型的答案(共 1350 个)均以二进制格式记录,并使用皮尔逊卡方检验和费雪精确检验进行分析,以评估正确性和一致性(P 结果):在连续几天的重复问题中,学习成绩优异者的答案具有很高的重复性。虽然在统计学上,各语言学家提供正确答案的成功率没有显著差异(p > 0.05),但根据成功率进行的排名如下:ChatGPT 3.5 (76.7%) > Copilot P (73.3%) > Copilot F (70%) > ChatGPT 4 (63.3%) > Gemini (46.7%)。ChatGPT 3.5、ChatGPT 4 和 Gemini 在多选题和填空题上的正确率明显高于它们在二分法(真/假)问题上的表现(p 结论:ChatGPT 3.5、ChatGPT 4 和 Gemini 在多选题和填空题上的正确率明显高于它们在二分法(真/假)问题上的表现:虽然 LLMs 显示出作为牙科创伤学辅助教学工具的潜力,但其不稳定的准确性和包含的不可靠参考资料要求采取谨慎的整合策略。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Dental Traumatology
Dental Traumatology 医学-牙科与口腔外科
CiteScore
6.40
自引率
32.00%
发文量
85
审稿时长
6-12 weeks
期刊介绍: Dental Traumatology is an international journal that aims to convey scientific and clinical progress in all areas related to adult and pediatric dental traumatology. This includes the following topics: - Epidemiology, Social Aspects, Education, Diagnostics - Esthetics / Prosthetics/ Restorative - Evidence Based Traumatology & Study Design - Oral & Maxillofacial Surgery/Transplant/Implant - Pediatrics and Orthodontics - Prevention and Sports Dentistry - Endodontics and Periodontal Aspects The journal"s aim is to promote communication among clinicians, educators, researchers, and others interested in the field of dental traumatology.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信