Balu Bhasuran, Qiao Jin, Yuzhang Xie, Carl Yang, Karim Hanna, Jennifer Costa, Cindy Shavor, Wenshan Han, Zhiyong Lu, Zhe He
{"title":"Preliminary analysis of the impact of lab results on large language model generated differential diagnoses","authors":"Balu Bhasuran, Qiao Jin, Yuzhang Xie, Carl Yang, Karim Hanna, Jennifer Costa, Cindy Shavor, Wenshan Han, Zhiyong Lu, Zhe He","doi":"10.1038/s41746-025-01556-8","DOIUrl":"https://doi.org/10.1038/s41746-025-01556-8","url":null,"abstract":"<p>Differential diagnosis (DDx) is crucial for medicine as it helps healthcare providers systematically distinguish between conditions that share similar symptoms. This study evaluates the influence of lab test results on DDx accuracy generated by large language models (LLMs). Clinical vignettes from 50 randomly selected case reports from PMC-Patients were created, incorporating demographics, symptoms, and lab data. Five LLMs—GPT-4, GPT-3.5, Llama-2-70b, Claude-2, and Mixtral-8x7B—were tested to generate Top 10, Top 5, and Top 1 DDx with and without lab data. Results show that incorporating lab data enhances accuracy by up to 30% across models. GPT-4 achieved the highest performance, with Top 1 accuracy of 55% (0.41–0.69) and lenient accuracy reaching 79% (0.68–0.90). Statistically significant improvements (Holm-adjusted <i>p</i> values < 0.05) were observed, with GPT-4 and Mixtral excelling. Lab tests, including liver function, metabolic/toxicology panels, and serology, were generally interpreted correctly by LLMs for DDx.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"55 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143641064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Consternation as Congress proposal for autonomous prescribing AI coincides with the haphazard cuts at the FDA","authors":"Stephen Gilbert, Tinglong Dai, Rebecca Mathias","doi":"10.1038/s41746-025-01540-2","DOIUrl":"https://doi.org/10.1038/s41746-025-01540-2","url":null,"abstract":"We live in interesting regulatory times. In January, a bill was introduced to the US Congress proposing that AI “can qualify as a practitioner eligible to prescribe drugs” if overseen by the States and FDA. This a bold and contentious move. Even proponents of AI’s swift integration into medicine must recognize the deep paradox: this proposal emerges even as the FDA’s world-leading infrastructure for AI oversight faces dismantling.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"20 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143641090","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex J. Goodell, Simon N. Chu, Dara Rouholiman, Larry F. Chu
{"title":"Large language model agents can use tools to perform clinical calculations","authors":"Alex J. Goodell, Simon N. Chu, Dara Rouholiman, Larry F. Chu","doi":"10.1038/s41746-025-01475-8","DOIUrl":"https://doi.org/10.1038/s41746-025-01475-8","url":null,"abstract":"<p>Large language models (LLMs) can answer expert-level questions in medicine but are prone to hallucinations and arithmetic errors. Early evidence suggests LLMs cannot reliably perform clinical calculations, limiting their potential integration into clinical workflows. We evaluated ChatGPT’s performance across 48 medical calculation tasks, finding incorrect responses in one-third of trials (<i>n</i> = 212). We then assessed three forms of agentic augmentation: retrieval-augmented generation, a code interpreter tool, and a set of task-specific calculation tools (OpenMedCalc) across 10,000 trials. Models with access to task-specific tools showed the greatest improvement, with LLaMa and GPT-based models demonstrating a 5.5-fold (88% vs 16%) and 13-fold (64% vs 4.8%) reduction in incorrect responses, respectively, compared to the unimproved models. Our findings suggest that integration of machine-readable, task-specific tools may help overcome LLMs’ limitations in medical calculations.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"18 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hangnyoung Choi, JaeSeong Hong, Hyun Goo Kang, Min-Hyeon Park, Sungji Ha, Junghan Lee, Sangchul Yoon, Daeseong Kim, Yu Rang Park, Keun-Ah Cheon
{"title":"Retinal fundus imaging as biomarker for ADHD using machine learning for screening and visual attention stratification","authors":"Hangnyoung Choi, JaeSeong Hong, Hyun Goo Kang, Min-Hyeon Park, Sungji Ha, Junghan Lee, Sangchul Yoon, Daeseong Kim, Yu Rang Park, Keun-Ah Cheon","doi":"10.1038/s41746-025-01547-9","DOIUrl":"https://doi.org/10.1038/s41746-025-01547-9","url":null,"abstract":"<p>Attention-deficit/hyperactivity disorder (ADHD), characterized by diagnostic complexity and symptom heterogeneity, is a prevalent neurodevelopmental disorder. Here, we explored the machine learning (ML) analysis of retinal fundus photographs as a noninvasive biomarker for ADHD screening and stratification of executive function (EF) deficits. From April to October 2022, 323 children and adolescents with ADHD were recruited from two tertiary South Korean hospitals, and the age- and sex-matched individuals with typical development were retrospectively collected. We used the AutoMorph pipeline to extract retinal features and used four types of ML models for ADHD screening and EF subdomain prediction, and we adopted the Shapely additive explanation method. ADHD screening models achieved 95.5%-96.9% AUROC. For EF function stratification, the visual and auditory subdomains showed strong (AUROC > 85%) and poor performances, respectively. Our analysis of retinal fundus photographs demonstrated potential as a noninvasive biomarker for ADHD screening and EF deficit stratification in the visual attention domain.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"70 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of computer aided detection system on esophageal neoplasm diagnosis in varied levels of endoscopists","authors":"Bing Li, Yan-Yun Du, Wei-Min Tan, Dong-Li He, Zhi-Peng Qi, Hon-Ho Yu, Qiang Shi, Zhong Ren, Ming-Yan Cai, Bo Yan, Shi-Lun Cai, Yun-Shi Zhong","doi":"10.1038/s41746-025-01532-2","DOIUrl":"https://doi.org/10.1038/s41746-025-01532-2","url":null,"abstract":"<p>A computer-aided detection (CAD) system for early esophagus carcinoma identification during endoscopy with narrow-band imaging (NBI) was evaluated in a large-scale, prospective, tandem, randomized controlled trial to assess its effectiveness. The study was registered at the Chinese Clinical Trial Registry (ChiCTR2100050654, 2021/09/01). Involving 3400 patients were randomly assigned to either routine (routine-first) or CAD-assisted (CAD-first) NBI endoscopy, followed by the other procedure, with targeted biopsies taken at the end of the second examination. The primary outcome was the diagnosis of 1 or more neoplastic lesion of esophagus during the first examination. The CAD-first group demonstrated a significantly higher neoplastic lesion detection rate (3.12%) compared to the routine-first group (1.59%) with a relative detection ratio of 1.96 (<i>P</i> = 0.0047). Subgroup analysis revealed a higher detection rate in junior endoscopists using CAD-first, while no significant difference was observed for senior endoscopists. The CAD system significantly improved esophageal neoplasm detection, particularly benefiting junior endoscopists.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"21 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xi Chen, Huahui Yi, Mingke You, WeiZhi Liu, Li Wang, Hairui Li, Xue Zhang, Yingman Guo, Lei Fan, Gang Chen, Qicheng Lao, Weili Fu, Kang Li, Jian Li
{"title":"Enhancing diagnostic capability with multi-agents conversational large language models","authors":"Xi Chen, Huahui Yi, Mingke You, WeiZhi Liu, Li Wang, Hairui Li, Xue Zhang, Yingman Guo, Lei Fan, Gang Chen, Qicheng Lao, Weili Fu, Kang Li, Jian Li","doi":"10.1038/s41746-025-01550-0","DOIUrl":"https://doi.org/10.1038/s41746-025-01550-0","url":null,"abstract":"<p>Large Language Models (LLMs) show promise in healthcare tasks but face challenges in complex medical scenarios. We developed a Multi-Agent Conversation (MAC) framework for disease diagnosis, inspired by clinical Multi-Disciplinary Team discussions. Using 302 rare disease cases, we evaluated GPT-3.5, GPT-4, and MAC on medical knowledge and clinical reasoning. MAC outperformed single models in both primary and follow-up consultations, achieving higher accuracy in diagnoses and suggested tests. Optimal performance was achieved with four doctor agents and a supervisor agent, using GPT-4 as the base model. MAC demonstrated high consistency across repeated runs. Further comparative analysis showed MAC also outperformed other methods including Chain of Thoughts (CoT), Self-Refine, and Self-Consistency with higher performance and more output tokens. This framework significantly enhanced LLMs’ diagnostic capabilities, effectively bridging theoretical knowledge and practical clinical application. Our findings highlight the potential of multi-agent LLMs in healthcare and suggest further research into their clinical implementation.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"56 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna M. Marcinkiewicz, Wenhao Zhang, Aakash Shanbhag, Robert J. H. Miller, Mark Lemley, Giselle Ramirez, Mikolaj Buchwald, Aditya Killekar, Paul B. Kavanagh, Attila Feher, Edward J. Miller, Andrew J. Einstein, Terrence D. Ruddy, Joanna X. Liang, Valerie Builoff, David Ouyang, Daniel S. Berman, Damini Dey, Piotr J. Slomka
{"title":"Holistic AI analysis of hybrid cardiac perfusion images for mortality prediction","authors":"Anna M. Marcinkiewicz, Wenhao Zhang, Aakash Shanbhag, Robert J. H. Miller, Mark Lemley, Giselle Ramirez, Mikolaj Buchwald, Aditya Killekar, Paul B. Kavanagh, Attila Feher, Edward J. Miller, Andrew J. Einstein, Terrence D. Ruddy, Joanna X. Liang, Valerie Builoff, David Ouyang, Daniel S. Berman, Damini Dey, Piotr J. Slomka","doi":"10.1038/s41746-025-01526-0","DOIUrl":"https://doi.org/10.1038/s41746-025-01526-0","url":null,"abstract":"<p>Low-dose computed tomography attenuation correction (CTAC) scans are used in hybrid myocardial perfusion imaging (MPI) for attenuation correction and coronary calcium scoring, and contain additional anatomic and pathologic information not utilized in clinical assessment. We seek to uncover the full potential of these scans utilizing a holistic artificial intelligence (AI) approach. A multi-structure model segmented 33 structures and quantified 15 radiomics features in each organ in 10,480 patients from 4 sites. Coronary calcium and epicardial fat measures were obtained from separate AI models. The area under the receiver-operating characteristic curves (AUC) for all-cause mortality prediction of the model utilizing MPI, CT, stress test, and clinical features was 0.80 (95% confidence interval [0.74–0.87]), which was higher than for coronary calcium (0.64 [0.57–0.71]) or perfusion (0.62 [0.55–0.70]), with <i>p</i> < 0.001 for both. A comprehensive multimodality approach can significantly improve mortality prediction compared to MPI information alone in patients undergoing hybrid MPI.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"13 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global 10 year ecological momentary assessment and mobile sensing study on tinnitus and environmental sounds","authors":"Robin Kraft, Berthold Langguth, Jorge Simoes, Manfred Reichert, Winfried Schlee, Rüdiger Pryss","doi":"10.1038/s41746-025-01551-z","DOIUrl":"https://doi.org/10.1038/s41746-025-01551-z","url":null,"abstract":"<p>In most tinnitus patients, tinnitus can be masked by external sounds. However, evidence for the efficacy of sound-based treatments is scarce. To elucidate the effect of sounds on tinnitus under real-world conditions, we collected data through the TrackYourTinnitus mobile platform over a ten-year period using Ecological Momentary Assessment and Mobile Crowdsensing. Using this dataset, we analyzed 67,442 samples from 572 users. Depending on the effect of environmental sounds on tinnitus, we identified three groups (T-, T+, T0) using Growth Mixture Modeling (GMM). Moreover, we compared these groups with respect to demographic, clinical, and user characteristics. We found that external sound reduces tinnitus (T-) in about 20% of users, increases tinnitus (T+) in about 5%, and leaves tinnitus unaffected (T0) in about 75%. The three groups differed significantly with respect to age and hearing problems, suggesting that the effect of sound on tinnitus is a relevant criterion for clinical subtyping.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"1 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143618605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi-han Sheu, Jaak Simm, Bo Wang, Hyunjoon Lee, Jordan W. Smoller
{"title":"Continuous time and dynamic suicide attempt risk prediction with neural ordinary differential equations","authors":"Yi-han Sheu, Jaak Simm, Bo Wang, Hyunjoon Lee, Jordan W. Smoller","doi":"10.1038/s41746-025-01552-y","DOIUrl":"https://doi.org/10.1038/s41746-025-01552-y","url":null,"abstract":"<p>Current clinician-based and automated risk assessment methods treat the risk of suicide-related behaviors (SRBs) as static, while in actual clinical practice, SRB risk fluctuates over time. Here, we develop two closely related model classes, Event-GRU-ODE and Event-GRU-Discretized, that can predict the dynamic risk of events as a continuous trajectory across future time points, even without new observations, while updating these estimates as new data become available. Models were trained and validated for SRB prediction using a large electronic health record database. Both models demonstrated high discrimination (e.g., Event-GRU-ODE AUROC = 0.93, AUPRC = 0.10, relative risk = 13.4 at 95% specificity in a low-prevalence [0.15%] general cohort with a 1.5-year prediction window). This work provides an initial step toward developing novel suicide prevention strategies based on dynamic changes in risk.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"11 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ji Woo Hwang, Ga Eun Lee, Jae Hyun Woo, Sung Min Kim, Ji Yean Kwon
{"title":"Systematic review and meta-analysis on fully automated digital cognitive behavioral therapy for insomnia","authors":"Ji Woo Hwang, Ga Eun Lee, Jae Hyun Woo, Sung Min Kim, Ji Yean Kwon","doi":"10.1038/s41746-025-01514-4","DOIUrl":"https://doi.org/10.1038/s41746-025-01514-4","url":null,"abstract":"<p>Insomnia impairs daily functioning and increases health risks. Cognitive behavioral therapy for insomnia (CBT-I) is effective but limited by cost and therapist availability. Fully automated digital CBT-I (FA dCBT-I) provides an accessible alternative without therapist involvement. This systematic review and meta-analysis evaluated the effectiveness of FA dCBT-I across 29 randomized controlled trials (RCTs) involving 9475 participants. Compared to control groups, FA dCBT-I demonstrated moderate to large effects on insomnia severity. Subgroup analyses indicated that FA dCBT-I had a significant impact when contrasted with most control groups but was less effective than therapist-assisted CBT-I. Meta-regression revealed that control group type moderated outcomes, whereas completion rate did not. This implies that treatment adherence, rather than merely completing the intervention, is crucial for its effectiveness. This study supports the potential of FA dCBT-I as a promising option for managing insomnia but underscores that a hybrid model combining therapist support is more beneficial.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"31 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}