基于心血管和胸部成像模式的放射学鉴别诊断:四种大型语言模型的观点

IF 0.9 Q4 RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING
Pradosh Kumar Sarangi, A. Irodi, Swaha Panda, Debasish Swapnesh Kumar Nayak, Himel Mondal
{"title":"基于心血管和胸部成像模式的放射学鉴别诊断:四种大型语言模型的观点","authors":"Pradosh Kumar Sarangi, A. Irodi, Swaha Panda, Debasish Swapnesh Kumar Nayak, Himel Mondal","doi":"10.1055/s-0043-1777289","DOIUrl":null,"url":null,"abstract":"Abstract Background  Differential diagnosis in radiology is a critical aspect of clinical decision-making. Radiologists in the early stages may find difficulties in listing the differential diagnosis from image patterns. In this context, the emergence of large language models (LLMs) has introduced new opportunities as these models have the capacity to access and contextualize extensive information from text-based input. Objective  The objective of this study was to explore the utility of four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—in providing most important differential diagnoses of cardiovascular and thoracic imaging patterns. Methods  We selected 15 unique cardiovascular ( n  = 5) and thoracic ( n  = 10) imaging patterns. We asked each model to generate top 5 most important differential diagnoses for every pattern. Concurrently, a panel of two cardiothoracic radiologists independently identified top 5 differentials for each case and came to consensus when discrepancies occurred. We checked the concordance and acceptance of LLM-generated differentials with the consensus differential diagnosis. Categorical variables were compared by binomial, chi-squared, or Fisher's exact test. Results  A total of 15 cases with five differentials generated a total of 75 items to analyze. The highest level of concordance was observed for diagnoses provided by Perplexity (66.67%), followed by ChatGPT (65.33%) and Bing (62.67%). The lowest score was for Bard with 45.33% of concordance with expert consensus. The acceptance rate was highest for Perplexity (90.67%), followed by Bing (89.33%) and ChatGPT (85.33%). The lowest acceptance rate was for Bard (69.33%). Conclusion  Four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—generated differential diagnoses had high level of acceptance but relatively lower concordance. There were significant differences in acceptance and concordance among the LLMs. Hence, it is important to carefully select the suitable model for usage in patient care or in medical education.","PeriodicalId":51597,"journal":{"name":"Indian Journal of Radiology and Imaging","volume":null,"pages":null},"PeriodicalIF":0.9000,"publicationDate":"2023-12-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Radiological Differential Diagnoses Based on Cardiovascular and Thoracic Imaging Patterns: Perspectives of Four Large Language Models\",\"authors\":\"Pradosh Kumar Sarangi, A. Irodi, Swaha Panda, Debasish Swapnesh Kumar Nayak, Himel Mondal\",\"doi\":\"10.1055/s-0043-1777289\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Background  Differential diagnosis in radiology is a critical aspect of clinical decision-making. Radiologists in the early stages may find difficulties in listing the differential diagnosis from image patterns. In this context, the emergence of large language models (LLMs) has introduced new opportunities as these models have the capacity to access and contextualize extensive information from text-based input. Objective  The objective of this study was to explore the utility of four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—in providing most important differential diagnoses of cardiovascular and thoracic imaging patterns. Methods  We selected 15 unique cardiovascular ( n  = 5) and thoracic ( n  = 10) imaging patterns. We asked each model to generate top 5 most important differential diagnoses for every pattern. Concurrently, a panel of two cardiothoracic radiologists independently identified top 5 differentials for each case and came to consensus when discrepancies occurred. We checked the concordance and acceptance of LLM-generated differentials with the consensus differential diagnosis. Categorical variables were compared by binomial, chi-squared, or Fisher's exact test. Results  A total of 15 cases with five differentials generated a total of 75 items to analyze. The highest level of concordance was observed for diagnoses provided by Perplexity (66.67%), followed by ChatGPT (65.33%) and Bing (62.67%). The lowest score was for Bard with 45.33% of concordance with expert consensus. The acceptance rate was highest for Perplexity (90.67%), followed by Bing (89.33%) and ChatGPT (85.33%). The lowest acceptance rate was for Bard (69.33%). Conclusion  Four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—generated differential diagnoses had high level of acceptance but relatively lower concordance. There were significant differences in acceptance and concordance among the LLMs. Hence, it is important to carefully select the suitable model for usage in patient care or in medical education.\",\"PeriodicalId\":51597,\"journal\":{\"name\":\"Indian Journal of Radiology and Imaging\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.9000,\"publicationDate\":\"2023-12-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Indian Journal of Radiology and Imaging\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1055/s-0043-1777289\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Indian Journal of Radiology and Imaging","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1055/s-0043-1777289","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING","Score":null,"Total":0}
引用次数: 0

摘要

摘要 背景 放射学中的鉴别诊断是临床决策的一个重要方面。放射科医生在早期阶段可能很难根据图像模式列出鉴别诊断。在这种情况下,大语言模型(LLM)的出现带来了新的机遇,因为这些模型有能力从基于文本的输入中获取大量信息并将其上下文化。目的 本研究旨在探索四种 LLMs-ChatGPT3.5、Google Bard、Microsoft Bing 和 Perplexity 在提供心血管和胸部成像模式最重要的鉴别诊断方面的实用性。方法 我们选择了 15 种独特的心血管(n = 5)和胸部(n = 10)成像模式。我们要求每个模型为每种模式生成前 5 个最重要的鉴别诊断。同时,由两名心胸放射科专家组成的小组独立确定每个病例的前 5 个鉴别诊断,并在出现差异时达成共识。我们检查了 LLM 生成的鉴别诊断与共识鉴别诊断的一致性和可接受性。分类变量通过二项检验、卡方检验或费雪精确检验进行比较。结果 共有 15 个病例和 5 个鉴别诊断,共产生 75 个分析项目。Perplexity 提供的诊断一致性最高(66.67%),其次是 ChatGPT(65.33%)和 Bing(62.67%)。得分最低的是 Bard,与专家共识的吻合率为 45.33%。接受率最高的是 Perplexity(90.67%),其次是 Bing(89.33%)和 ChatGPT(85.33%)。接受率最低的是 Bard(69.33%)。结论 四种 LLMs--ChatGPT3.5、Google Bard、Microsoft Bing 和 Perplexity 生成的鉴别诊断具有较高的接受度,但一致性相对较低。各 LLM 在接受度和一致性方面存在明显差异。因此,在病人护理或医学教育中谨慎选择合适的模型非常重要。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Radiological Differential Diagnoses Based on Cardiovascular and Thoracic Imaging Patterns: Perspectives of Four Large Language Models
Abstract Background  Differential diagnosis in radiology is a critical aspect of clinical decision-making. Radiologists in the early stages may find difficulties in listing the differential diagnosis from image patterns. In this context, the emergence of large language models (LLMs) has introduced new opportunities as these models have the capacity to access and contextualize extensive information from text-based input. Objective  The objective of this study was to explore the utility of four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—in providing most important differential diagnoses of cardiovascular and thoracic imaging patterns. Methods  We selected 15 unique cardiovascular ( n  = 5) and thoracic ( n  = 10) imaging patterns. We asked each model to generate top 5 most important differential diagnoses for every pattern. Concurrently, a panel of two cardiothoracic radiologists independently identified top 5 differentials for each case and came to consensus when discrepancies occurred. We checked the concordance and acceptance of LLM-generated differentials with the consensus differential diagnosis. Categorical variables were compared by binomial, chi-squared, or Fisher's exact test. Results  A total of 15 cases with five differentials generated a total of 75 items to analyze. The highest level of concordance was observed for diagnoses provided by Perplexity (66.67%), followed by ChatGPT (65.33%) and Bing (62.67%). The lowest score was for Bard with 45.33% of concordance with expert consensus. The acceptance rate was highest for Perplexity (90.67%), followed by Bing (89.33%) and ChatGPT (85.33%). The lowest acceptance rate was for Bard (69.33%). Conclusion  Four LLMs—ChatGPT3.5, Google Bard, Microsoft Bing, and Perplexity—generated differential diagnoses had high level of acceptance but relatively lower concordance. There were significant differences in acceptance and concordance among the LLMs. Hence, it is important to carefully select the suitable model for usage in patient care or in medical education.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Indian Journal of Radiology and Imaging
Indian Journal of Radiology and Imaging RADIOLOGY, NUCLEAR MEDICINE & MEDICAL IMAGING-
CiteScore
1.20
自引率
0.00%
发文量
115
审稿时长
45 weeks
期刊介绍: Information not localized
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信