Xintian Yang, Tongxin Li, Han Wang, Rongchun Zhang, Zhi Ni, Na Liu, Huihong Zhai, Jianghai Zhao, Fandong Meng, Zhongyin Zhou, Shanhong Tang, Limei Wang, Xiangping Wang, Hui Luo, Gui Ren, Linhui Zhang, Xiaoyu Kang, Jun Wang, Ning Bo, Xiaoning Yang, Weijie Xue, Xiaoyin Zhang, Ning Chen, Rui Guo, Baiwen Li, Yajun Li, Yaling Liu, Tiantian Zhang, Shuhui Liang, Yong Lv, Yongzhan Nie, Daiming Fan, Lina Zhao, Yanglin Pan
{"title":"Multiple large language models versus experienced physicians in diagnosing challenging cases with gastrointestinal symptoms","authors":"Xintian Yang, Tongxin Li, Han Wang, Rongchun Zhang, Zhi Ni, Na Liu, Huihong Zhai, Jianghai Zhao, Fandong Meng, Zhongyin Zhou, Shanhong Tang, Limei Wang, Xiangping Wang, Hui Luo, Gui Ren, Linhui Zhang, Xiaoyu Kang, Jun Wang, Ning Bo, Xiaoning Yang, Weijie Xue, Xiaoyin Zhang, Ning Chen, Rui Guo, Baiwen Li, Yajun Li, Yaling Liu, Tiantian Zhang, Shuhui Liang, Yong Lv, Yongzhan Nie, Daiming Fan, Lina Zhao, Yanglin Pan","doi":"10.1038/s41746-025-01486-5","DOIUrl":null,"url":null,"abstract":"<p>Faced with challenging cases, doctors are increasingly seeking diagnostic advice from large language models (LLMs). This study aims to compare the ability of LLMs and human physicians to diagnose challenging cases. An offline dataset of 67 challenging cases with primary gastrointestinal symptoms was used to solicit possible diagnoses from seven LLMs and 22 gastroenterologists. The diagnoses by Claude 3.5 Sonnet covered the highest proportion (95% confidence interval [CI]) of instructive diagnoses (76.1%, [70.6%–80.9%]), significantly surpassing all the gastroenterologists (<i>p</i> < 0.05 for all). Claude 3.5 Sonnet achieved a significantly higher coverage rate (95% CI) than that of the gastroenterologists using search engines or other traditional resource (76.1% [70.6%–80.9%] vs. 45.5% [40.7%-50.4%], <i>p</i> < 0.001). The study highlights that advanced LLMs may assist gastroenterologists with instructive, time-saving, and cost-effective diagnostic scopes in challenging cases.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"84 1","pages":""},"PeriodicalIF":12.4000,"publicationDate":"2025-02-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"NPJ Digital Medicine","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1038/s41746-025-01486-5","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Faced with challenging cases, doctors are increasingly seeking diagnostic advice from large language models (LLMs). This study aims to compare the ability of LLMs and human physicians to diagnose challenging cases. An offline dataset of 67 challenging cases with primary gastrointestinal symptoms was used to solicit possible diagnoses from seven LLMs and 22 gastroenterologists. The diagnoses by Claude 3.5 Sonnet covered the highest proportion (95% confidence interval [CI]) of instructive diagnoses (76.1%, [70.6%–80.9%]), significantly surpassing all the gastroenterologists (p < 0.05 for all). Claude 3.5 Sonnet achieved a significantly higher coverage rate (95% CI) than that of the gastroenterologists using search engines or other traditional resource (76.1% [70.6%–80.9%] vs. 45.5% [40.7%-50.4%], p < 0.001). The study highlights that advanced LLMs may assist gastroenterologists with instructive, time-saving, and cost-effective diagnostic scopes in challenging cases.
期刊介绍:
npj Digital Medicine is an online open-access journal that focuses on publishing peer-reviewed research in the field of digital medicine. The journal covers various aspects of digital medicine, including the application and implementation of digital and mobile technologies in clinical settings, virtual healthcare, and the use of artificial intelligence and informatics.
The primary goal of the journal is to support innovation and the advancement of healthcare through the integration of new digital and mobile technologies. When determining if a manuscript is suitable for publication, the journal considers four important criteria: novelty, clinical relevance, scientific rigor, and digital innovation.