Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?

IF 2.3 3区医学 Q2 DENTISTRY, ORAL SURGERY & MEDICINE

Dental Traumatology Pub Date : 2025-04-01 DOI:10.1111/edt.13063

Hasibe Elif Kuru, Aslı Aşık, Doğukan Mert Demir

{"title":"Can Artificial Intelligence Language Models Effectively Address Dental Trauma Questions?","authors":"Hasibe Elif Kuru, Aslı Aşık, Doğukan Mert Demir","doi":"10.1111/edt.13063","DOIUrl":null,"url":null,"abstract":"Background/aim: Artificial intelligence (AI) chatbots, also known as large language models (LLMs), have become increasingly common educational tools in healthcare. Although the use of LLMs for emergency dental trauma is gaining popularity, it is crucial to assess their reliability. This study aimed to compare the reliabilities of different LLMs in response to multiple questions related to dental trauma.Materials and methods: In a cross-sectional observational study conducted in October 2024, 30 questions (10 multiple-choice, 10 fill-in-the-blank, and 10 dichotomous) based on the International Association of Dental Traumatology guidelines were posed to five LLMs: ChatGPT 4, ChatGPT 3.5, Copilot Free version (Copilot F), Copilot Pro (Copilot P), and Google Gemini over nine consecutive days. Responses of each model (1350 in total) were recorded in binary format and analyzed using Pearson's chi-square and Fisher's exact tests to assess correctness and consistency (p < 0.05).Results: The answers provided by the LLMs to repeated questions on consecutive days showed a high degree of repeatability. Although there was no statistically significant difference in the success rate of providing correct answers among the LLMs (p > 0.05), the rankings based on the rate of successful answers were as follows: ChatGPT 3.5 (76.7%) > Copilot P (73.3%) > Copilot F (70%) > ChatGPT 4 (63.3%) > Gemini (46.7%). ChatGPT 3.5, ChatGPT 4, and Gemini showed a significantly higher correct response rate for multiple choice and fill in the blank questions compared to their performance on dichotomous (true/false) questions (p < 0.05). Conversely, The Copilot models did not exhibit significant differences across question types. Notably, the explanations provided by Copilot and Gemini were often inaccurate, and Copilot's cited references had low evidential value.Conclusions: While LLMs show potential as adjunct educational tools in dental traumatology, their variable accuracy and inclusion of unreliable references call for careful integration strategies.","PeriodicalId":55180,"journal":{"name":"Dental Traumatology","volume":" ","pages":""},"PeriodicalIF":2.3000,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dental Traumatology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1111/edt.13063","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Background/aim: Artificial intelligence (AI) chatbots, also known as large language models (LLMs), have become increasingly common educational tools in healthcare. Although the use of LLMs for emergency dental trauma is gaining popularity, it is crucial to assess their reliability. This study aimed to compare the reliabilities of different LLMs in response to multiple questions related to dental trauma.

Materials and methods: In a cross-sectional observational study conducted in October 2024, 30 questions (10 multiple-choice, 10 fill-in-the-blank, and 10 dichotomous) based on the International Association of Dental Traumatology guidelines were posed to five LLMs: ChatGPT 4, ChatGPT 3.5, Copilot Free version (Copilot F), Copilot Pro (Copilot P), and Google Gemini over nine consecutive days. Responses of each model (1350 in total) were recorded in binary format and analyzed using Pearson's chi-square and Fisher's exact tests to assess correctness and consistency (p < 0.05).

Results: The answers provided by the LLMs to repeated questions on consecutive days showed a high degree of repeatability. Although there was no statistically significant difference in the success rate of providing correct answers among the LLMs (p > 0.05), the rankings based on the rate of successful answers were as follows: ChatGPT 3.5 (76.7%) > Copilot P (73.3%) > Copilot F (70%) > ChatGPT 4 (63.3%) > Gemini (46.7%). ChatGPT 3.5, ChatGPT 4, and Gemini showed a significantly higher correct response rate for multiple choice and fill in the blank questions compared to their performance on dichotomous (true/false) questions (p < 0.05). Conversely, The Copilot models did not exhibit significant differences across question types. Notably, the explanations provided by Copilot and Gemini were often inaccurate, and Copilot's cited references had low evidential value.

Conclusions: While LLMs show potential as adjunct educational tools in dental traumatology, their variable accuracy and inclusion of unreliable references call for careful integration strategies.

查看原文本刊更多论文

人工智能语言模型能有效解决牙外伤问题吗？

背景/目的：人工智能（AI）聊天机器人（也称为大型语言模型（LLM））已成为医疗保健领域越来越常见的教育工具。虽然在牙科创伤急救中使用 LLMs 的做法越来越受欢迎，但评估其可靠性至关重要。本研究旨在比较不同大型语言模型在回答与牙科创伤相关的多个问题时的可靠性：在 2024 年 10 月进行的一项横断面观察研究中，根据国际牙科创伤学会指南向五种 LLM 提出了 30 个问题（10 个选择题、10 个填空题和 10 个二分题）：ChatGPT 4、ChatGPT 3.5、Copilot 免费版（Copilot F）、Copilot 专业版（Copilot P）和 Google Gemini 连续九天进行了测试。每个模型的答案（共 1350 个）均以二进制格式记录，并使用皮尔逊卡方检验和费雪精确检验进行分析，以评估正确性和一致性（P 结果）：在连续几天的重复问题中，学习成绩优异者的答案具有很高的重复性。虽然在统计学上，各语言学家提供正确答案的成功率没有显著差异（p > 0.05），但根据成功率进行的排名如下：ChatGPT 3.5 (76.7%) > Copilot P (73.3%) > Copilot F (70%) > ChatGPT 4 (63.3%) > Gemini (46.7%)。ChatGPT 3.5、ChatGPT 4 和 Gemini 在多选题和填空题上的正确率明显高于它们在二分法（真/假）问题上的表现（p 结论：ChatGPT 3.5、ChatGPT 4 和 Gemini 在多选题和填空题上的正确率明显高于它们在二分法（真/假）问题上的表现：虽然 LLMs 显示出作为牙科创伤学辅助教学工具的潜力，但其不稳定的准确性和包含的不可靠参考资料要求采取谨慎的整合策略。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Dental Traumatology 医学-牙科与口腔外科

CiteScore

6.40

自引率

32.00%

发文量

审稿时长

6-12 weeks

期刊介绍： Dental Traumatology is an international journal that aims to convey scientific and clinical progress in all areas related to adult and pediatric dental traumatology. This includes the following topics: - Epidemiology, Social Aspects, Education, Diagnostics - Esthetics / Prosthetics/ Restorative - Evidence Based Traumatology & Study Design - Oral & Maxillofacial Surgery/Transplant/Implant - Pediatrics and Orthodontics - Prevention and Sports Dentistry - Endodontics and Periodontal Aspects The journal"s aim is to promote communication among clinicians, educators, researchers, and others interested in the field of dental traumatology.