A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: a proof of concept study.

IF 3.7 Q1 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING

European Radiology Experimental Pub Date : 2024-05-17 DOI:10.1186/s41747-024-00457-x

Stephan Rau, Alexander Rau, Johanna Nattenmüller, Anna Fink, Fabian Bamberg, Marco Reisert, Maximilian F Russe

{"title":"A retrieval-augmented chatbot based on GPT-4 provides appropriate differential diagnosis in gastrointestinal radiology: a proof of concept study.","authors":"Stephan Rau, Alexander Rau, Johanna Nattenmüller, Anna Fink, Fabian Bamberg, Marco Reisert, Maximilian F Russe","doi":"10.1186/s41747-024-00457-x","DOIUrl":null,"url":null,"abstract":"Background: We investigated the potential of an imaging-aware GPT-4-based chatbot in providing diagnoses based on imaging descriptions of abdominal pathologies.Methods: Utilizing zero-shot learning via the LlamaIndex framework, GPT-4 was enhanced using the 96 documents from the Radiographics Top 10 Reading List on gastrointestinal imaging, creating a gastrointestinal imaging-aware chatbot (GIA-CB). To assess its diagnostic capability, 50 cases on a variety of abdominal pathologies were created, comprising radiological findings in fluoroscopy, MRI, and CT. We compared the GIA-CB to the generic GPT-4 chatbot (g-CB) in providing the primary and 2 additional differential diagnoses, using interpretations from senior-level radiologists as ground truth. The trustworthiness of the GIA-CB was evaluated by investigating the source documents as provided by the knowledge-retrieval mechanism. Mann-Whitney U test was employed.Results: The GIA-CB demonstrated a high capability to identify the most appropriate differential diagnosis in 39/50 cases (78%), significantly surpassing the g-CB in 27/50 cases (54%) (p = 0.006). Notably, the GIA-CB offered the primary differential in the top 3 differential diagnoses in 45/50 cases (90%) versus g-CB with 37/50 cases (74%) (p = 0.022) and always with appropriate explanations. The median response time was 29.8 s for GIA-CB and 15.7 s for g-CB, and the mean cost per case was $0.15 and $0.02, respectively.Conclusions: The GIA-CB not only provided an accurate diagnosis for gastrointestinal pathologies, but also direct access to source documents, providing insight into the decision-making process, a step towards trustworthy and explainable AI. Integrating context-specific data into AI models can support evidence-based clinical decision-making.Relevance statement: A context-aware GPT-4 chatbot demonstrates high accuracy in providing differential diagnoses based on imaging descriptions, surpassing the generic GPT-4. It provided formulated rationale and source excerpts supporting the diagnoses, thus enhancing trustworthy decision-support.Key points: • Knowledge retrieval enhances differential diagnoses in a gastrointestinal imaging-aware chatbot (GIA-CB). • GIA-CB outperformed the generic counterpart, providing formulated rationale and source excerpts. • GIA-CB has the potential to pave the way for AI-assisted decision support systems.","PeriodicalId":36926,"journal":{"name":"European Radiology Experimental","volume":"8 1","pages":"60"},"PeriodicalIF":3.7000,"publicationDate":"2024-05-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11098977/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"European Radiology Experimental","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1186/s41747-024-00457-x","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}

引用次数: 0

Abstract

Background: We investigated the potential of an imaging-aware GPT-4-based chatbot in providing diagnoses based on imaging descriptions of abdominal pathologies.

Methods: Utilizing zero-shot learning via the LlamaIndex framework, GPT-4 was enhanced using the 96 documents from the Radiographics Top 10 Reading List on gastrointestinal imaging, creating a gastrointestinal imaging-aware chatbot (GIA-CB). To assess its diagnostic capability, 50 cases on a variety of abdominal pathologies were created, comprising radiological findings in fluoroscopy, MRI, and CT. We compared the GIA-CB to the generic GPT-4 chatbot (g-CB) in providing the primary and 2 additional differential diagnoses, using interpretations from senior-level radiologists as ground truth. The trustworthiness of the GIA-CB was evaluated by investigating the source documents as provided by the knowledge-retrieval mechanism. Mann-Whitney U test was employed.

Results: The GIA-CB demonstrated a high capability to identify the most appropriate differential diagnosis in 39/50 cases (78%), significantly surpassing the g-CB in 27/50 cases (54%) (p = 0.006). Notably, the GIA-CB offered the primary differential in the top 3 differential diagnoses in 45/50 cases (90%) versus g-CB with 37/50 cases (74%) (p = 0.022) and always with appropriate explanations. The median response time was 29.8 s for GIA-CB and 15.7 s for g-CB, and the mean cost per case was $0.15 and $0.02, respectively.

Conclusions: The GIA-CB not only provided an accurate diagnosis for gastrointestinal pathologies, but also direct access to source documents, providing insight into the decision-making process, a step towards trustworthy and explainable AI. Integrating context-specific data into AI models can support evidence-based clinical decision-making.

Relevance statement: A context-aware GPT-4 chatbot demonstrates high accuracy in providing differential diagnoses based on imaging descriptions, surpassing the generic GPT-4. It provided formulated rationale and source excerpts supporting the diagnoses, thus enhancing trustworthy decision-support.

Key points: • Knowledge retrieval enhances differential diagnoses in a gastrointestinal imaging-aware chatbot (GIA-CB). • GIA-CB outperformed the generic counterpart, providing formulated rationale and source excerpts. • GIA-CB has the potential to pave the way for AI-assisted decision support systems.

Abstract Image

查看原文本刊更多论文

基于 GPT-4 的检索增强聊天机器人为胃肠道放射学提供适当的鉴别诊断：概念验证研究。

背景：我们研究了基于 GPT-4 的影像感知聊天机器人在根据腹部病变的影像描述提供诊断方面的潜力：方法：利用 LlamaIndex 框架的零点学习功能，使用 Radiographics Top 10 阅读列表中有关胃肠道成像的 96 篇文档增强了 GPT-4，从而创建了胃肠道成像感知聊天机器人（GIA-CB）。为了评估其诊断能力，我们创建了 50 个病例，涉及各种腹部病变，包括透视、核磁共振和 CT 的放射检查结果。我们将 GIA-CB 与通用的 GPT-4 聊天机器人（g-CB）进行了比较，后者以高级放射科医生的解释为基本事实，提供了主要诊断和两个额外的鉴别诊断。通过调查知识检索机制提供的源文件，对 GIA-CB 的可信度进行了评估。结果：结果：GIA-CB 在 39/50 个病例（78%）中表现出很高的鉴别诊断能力，在 27/50 个病例（54%）中明显超过了 g-CB（p = 0.006）。值得注意的是，在前 3 个鉴别诊断中，GIA-CB 为 45/50 个病例（90%）提供了主要鉴别诊断，而 g-CB 为 37/50 个病例（74%）提供了主要鉴别诊断（p = 0.022），而且总是有适当的解释。GIA-CB 的中位响应时间为 29.8 秒，g-CB 为 15.7 秒，每个病例的平均成本分别为 0.15 美元和 0.02 美元：GIA-CB不仅能提供胃肠道病变的准确诊断，还能直接访问源文件，为决策过程提供洞察力，是向可信和可解释的人工智能迈出的一步。将特定上下文数据整合到人工智能模型中可以支持循证临床决策：情境感知 GPT-4 聊天机器人根据成像描述提供鉴别诊断的准确性很高，超过了通用的 GPT-4。它提供了支持诊断的制定理由和来源摘录，从而提高了决策支持的可信度：- 知识检索增强了胃肠道成像感知聊天机器人（GIA-CB）的鉴别诊断能力。- GIA-CB 的表现优于普通聊天机器人，它能提供制定的理由和来源摘录。- GIA-CB 有潜力为人工智能辅助决策支持系统铺平道路。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊