A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging.

IF 2.9 2区医学 Q1 DENTISTRY, ORAL SURGERY & MEDICINE

Dento maxillo facial radiology Pub Date : 2024-02-08 DOI:10.1093/dmfr/twad015

Maximilian Frederik Russe, Alexander Rau, Michael Andreas Ermer, René Rothweiler, Sina Wenger, Klara Klöble, Ralf K W Schulze, Fabian Bamberg, Rainer Schmelzeisen, Marco Reisert, Wiebke Semper-Hogg

{"title":"A content-aware chatbot based on GPT 4 provides trustworthy recommendations for Cone-Beam CT guidelines in dental imaging.","authors":"Maximilian Frederik Russe, Alexander Rau, Michael Andreas Ermer, René Rothweiler, Sina Wenger, Klara Klöble, Ralf K W Schulze, Fabian Bamberg, Rainer Schmelzeisen, Marco Reisert, Wiebke Semper-Hogg","doi":"10.1093/dmfr/twad015","DOIUrl":null,"url":null,"abstract":"Objectives: To develop a content-aware chatbot based on GPT-3.5-Turbo and GPT-4 with specialized knowledge on the German S2 Cone-Beam CT (CBCT) dental imaging guideline and to compare the performance against humans.Methods: The LlamaIndex software library was used to integrate the guideline context into the chatbots. Based on the CBCT S2 guideline, 40 questions were posed to content-aware chatbots and early career and senior practitioners with different levels of experience served as reference. The chatbots' performance was compared in terms of recommendation accuracy and explanation quality. Chi-square test and one-tailed Wilcoxon signed rank test evaluated accuracy and explanation quality, respectively.Results: The GPT-4 based chatbot provided 100% correct recommendations and superior explanation quality compared to the one based on GPT3.5-Turbo (87.5% vs. 57.5% for GPT-3.5-Turbo; P = .003). Moreover, it outperformed early career practitioners in correct answers (P = .002 and P = .032) and earned higher trust than the chatbot using GPT-3.5-Turbo (P = 0.006).Conclusions: A content-aware chatbot using GPT-4 reliably provided recommendations according to current consensus guidelines. The responses were deemed trustworthy and transparent, and therefore facilitate the integration of artificial intelligence into clinical decision-making.","PeriodicalId":11261,"journal":{"name":"Dento maxillo facial radiology","volume":" ","pages":"109-114"},"PeriodicalIF":2.9000,"publicationDate":"2024-02-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11003655/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Dento maxillo facial radiology","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/dmfr/twad015","RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"DENTISTRY, ORAL SURGERY & MEDICINE","Score":null,"Total":0}

引用次数: 0

Abstract

Objectives: To develop a content-aware chatbot based on GPT-3.5-Turbo and GPT-4 with specialized knowledge on the German S2 Cone-Beam CT (CBCT) dental imaging guideline and to compare the performance against humans.

Methods: The LlamaIndex software library was used to integrate the guideline context into the chatbots. Based on the CBCT S2 guideline, 40 questions were posed to content-aware chatbots and early career and senior practitioners with different levels of experience served as reference. The chatbots' performance was compared in terms of recommendation accuracy and explanation quality. Chi-square test and one-tailed Wilcoxon signed rank test evaluated accuracy and explanation quality, respectively.

Results: The GPT-4 based chatbot provided 100% correct recommendations and superior explanation quality compared to the one based on GPT3.5-Turbo (87.5% vs. 57.5% for GPT-3.5-Turbo; P = .003). Moreover, it outperformed early career practitioners in correct answers (P = .002 and P = .032) and earned higher trust than the chatbot using GPT-3.5-Turbo (P = 0.006).

Conclusions: A content-aware chatbot using GPT-4 reliably provided recommendations according to current consensus guidelines. The responses were deemed trustworthy and transparent, and therefore facilitate the integration of artificial intelligence into clinical decision-making.

查看原文本刊更多论文

基于 GPT 4 的内容感知聊天机器人为牙科成像中的锥形束计算机断层扫描指南提供值得信赖的建议。

目的开发基于 GPT-3.5-Turbo 和 GPT-4 的内容感知聊天机器人，该聊天机器人具备德国 S2 锥束 CT（CBCT）牙科成像指南的专业知识，并将其性能与人类进行比较：方法：使用 LlamaIndex 软件库将指南内容整合到聊天机器人中。根据 CBCT S2 指南，向内容感知聊天机器人提出了 40 个问题，并以不同经验水平的早期和资深从业者作为参考。聊天机器人在推荐准确性和解释质量方面的表现进行了比较。对准确性和解释质量分别进行了卡方检验和单尾 Wilcoxon 符号秩检验：结果：与基于 GPT3.5-Turbo 的聊天机器人相比，基于 GPT-4 的聊天机器人提供了 100% 的正确推荐和更高的解释质量（87.5% vs. 57.5% for GPT-3.5-Turbo；p = 0.003）。此外，与使用 GPT-3.5-Turbo 的聊天机器人相比，GPT-3.5-Turbo 的正确答案率（p = 0.002 和 p = 0.032）和信任度（p = 0.006）均优于早期职业从业者：使用 GPT-4 的内容感知聊天机器人根据当前的共识指南提供了可靠的建议。结论：使用 GPT-4 的内容感知聊天机器人根据当前的共识指南提供了可靠的建议，其回复被认为是可信和透明的，因此促进了人工智能与临床决策的整合。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Dento maxillo facial radiology 医学-核医学

CiteScore

5.60

自引率

9.10%

发文量

审稿时长

4-8 weeks

期刊介绍： Dentomaxillofacial Radiology (DMFR) is the journal of the International Association of Dentomaxillofacial Radiology (IADMFR) and covers the closely related fields of oral radiology and head and neck imaging. Established in 1972, DMFR is a key resource keeping dentists, radiologists and clinicians and scientists with an interest in Head and Neck imaging abreast of important research and developments in oral and maxillofacial radiology. The DMFR editorial board features a panel of international experts including Editor-in-Chief Professor Ralf Schulze. Our editorial board provide their expertise and guidance in shaping the content and direction of the journal. Quick Facts: - 2015 Impact Factor - 1.919 - Receipt to first decision - average of 3 weeks - Acceptance to online publication - average of 3 weeks - Open access option - ISSN: 0250-832X - eISSN: 1476-542X