Alex J. Goodell, Simon N. Chu, Dara Rouholiman, Larry F. Chu
{"title":"Large language model agents can use tools to perform clinical calculations","authors":"Alex J. Goodell, Simon N. Chu, Dara Rouholiman, Larry F. Chu","doi":"10.1038/s41746-025-01475-8","DOIUrl":"https://doi.org/10.1038/s41746-025-01475-8","url":null,"abstract":"<p>Large language models (LLMs) can answer expert-level questions in medicine but are prone to hallucinations and arithmetic errors. Early evidence suggests LLMs cannot reliably perform clinical calculations, limiting their potential integration into clinical workflows. We evaluated ChatGPT’s performance across 48 medical calculation tasks, finding incorrect responses in one-third of trials (<i>n</i> = 212). We then assessed three forms of agentic augmentation: retrieval-augmented generation, a code interpreter tool, and a set of task-specific calculation tools (OpenMedCalc) across 10,000 trials. Models with access to task-specific tools showed the greatest improvement, with LLaMa and GPT-based models demonstrating a 5.5-fold (88% vs 16%) and 13-fold (64% vs 4.8%) reduction in incorrect responses, respectively, compared to the unimproved models. Our findings suggest that integration of machine-readable, task-specific tools may help overcome LLMs’ limitations in medical calculations.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"18 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Hangnyoung Choi, JaeSeong Hong, Hyun Goo Kang, Min-Hyeon Park, Sungji Ha, Junghan Lee, Sangchul Yoon, Daeseong Kim, Yu Rang Park, Keun-Ah Cheon
{"title":"Retinal fundus imaging as biomarker for ADHD using machine learning for screening and visual attention stratification","authors":"Hangnyoung Choi, JaeSeong Hong, Hyun Goo Kang, Min-Hyeon Park, Sungji Ha, Junghan Lee, Sangchul Yoon, Daeseong Kim, Yu Rang Park, Keun-Ah Cheon","doi":"10.1038/s41746-025-01547-9","DOIUrl":"https://doi.org/10.1038/s41746-025-01547-9","url":null,"abstract":"<p>Attention-deficit/hyperactivity disorder (ADHD), characterized by diagnostic complexity and symptom heterogeneity, is a prevalent neurodevelopmental disorder. Here, we explored the machine learning (ML) analysis of retinal fundus photographs as a noninvasive biomarker for ADHD screening and stratification of executive function (EF) deficits. From April to October 2022, 323 children and adolescents with ADHD were recruited from two tertiary South Korean hospitals, and the age- and sex-matched individuals with typical development were retrospectively collected. We used the AutoMorph pipeline to extract retinal features and used four types of ML models for ADHD screening and EF subdomain prediction, and we adopted the Shapely additive explanation method. ADHD screening models achieved 95.5%-96.9% AUROC. For EF function stratification, the visual and auditory subdomains showed strong (AUROC > 85%) and poor performances, respectively. Our analysis of retinal fundus photographs demonstrated potential as a noninvasive biomarker for ADHD screening and EF deficit stratification in the visual attention domain.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"70 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143635675","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Effect of computer aided detection system on esophageal neoplasm diagnosis in varied levels of endoscopists","authors":"Bing Li, Yan-Yun Du, Wei-Min Tan, Dong-Li He, Zhi-Peng Qi, Hon-Ho Yu, Qiang Shi, Zhong Ren, Ming-Yan Cai, Bo Yan, Shi-Lun Cai, Yun-Shi Zhong","doi":"10.1038/s41746-025-01532-2","DOIUrl":"https://doi.org/10.1038/s41746-025-01532-2","url":null,"abstract":"<p>A computer-aided detection (CAD) system for early esophagus carcinoma identification during endoscopy with narrow-band imaging (NBI) was evaluated in a large-scale, prospective, tandem, randomized controlled trial to assess its effectiveness. The study was registered at the Chinese Clinical Trial Registry (ChiCTR2100050654, 2021/09/01). Involving 3400 patients were randomly assigned to either routine (routine-first) or CAD-assisted (CAD-first) NBI endoscopy, followed by the other procedure, with targeted biopsies taken at the end of the second examination. The primary outcome was the diagnosis of 1 or more neoplastic lesion of esophagus during the first examination. The CAD-first group demonstrated a significantly higher neoplastic lesion detection rate (3.12%) compared to the routine-first group (1.59%) with a relative detection ratio of 1.96 (<i>P</i> = 0.0047). Subgroup analysis revealed a higher detection rate in junior endoscopists using CAD-first, while no significant difference was observed for senior endoscopists. The CAD system significantly improved esophageal neoplasm detection, particularly benefiting junior endoscopists.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"21 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607866","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xi Chen, Huahui Yi, Mingke You, WeiZhi Liu, Li Wang, Hairui Li, Xue Zhang, Yingman Guo, Lei Fan, Gang Chen, Qicheng Lao, Weili Fu, Kang Li, Jian Li
{"title":"Enhancing diagnostic capability with multi-agents conversational large language models","authors":"Xi Chen, Huahui Yi, Mingke You, WeiZhi Liu, Li Wang, Hairui Li, Xue Zhang, Yingman Guo, Lei Fan, Gang Chen, Qicheng Lao, Weili Fu, Kang Li, Jian Li","doi":"10.1038/s41746-025-01550-0","DOIUrl":"https://doi.org/10.1038/s41746-025-01550-0","url":null,"abstract":"<p>Large Language Models (LLMs) show promise in healthcare tasks but face challenges in complex medical scenarios. We developed a Multi-Agent Conversation (MAC) framework for disease diagnosis, inspired by clinical Multi-Disciplinary Team discussions. Using 302 rare disease cases, we evaluated GPT-3.5, GPT-4, and MAC on medical knowledge and clinical reasoning. MAC outperformed single models in both primary and follow-up consultations, achieving higher accuracy in diagnoses and suggested tests. Optimal performance was achieved with four doctor agents and a supervisor agent, using GPT-4 as the base model. MAC demonstrated high consistency across repeated runs. Further comparative analysis showed MAC also outperformed other methods including Chain of Thoughts (CoT), Self-Refine, and Self-Consistency with higher performance and more output tokens. This framework significantly enhanced LLMs’ diagnostic capabilities, effectively bridging theoretical knowledge and practical clinical application. Our findings highlight the potential of multi-agent LLMs in healthcare and suggest further research into their clinical implementation.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"56 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607869","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Anna M. Marcinkiewicz, Wenhao Zhang, Aakash Shanbhag, Robert J. H. Miller, Mark Lemley, Giselle Ramirez, Mikolaj Buchwald, Aditya Killekar, Paul B. Kavanagh, Attila Feher, Edward J. Miller, Andrew J. Einstein, Terrence D. Ruddy, Joanna X. Liang, Valerie Builoff, David Ouyang, Daniel S. Berman, Damini Dey, Piotr J. Slomka
{"title":"Holistic AI analysis of hybrid cardiac perfusion images for mortality prediction","authors":"Anna M. Marcinkiewicz, Wenhao Zhang, Aakash Shanbhag, Robert J. H. Miller, Mark Lemley, Giselle Ramirez, Mikolaj Buchwald, Aditya Killekar, Paul B. Kavanagh, Attila Feher, Edward J. Miller, Andrew J. Einstein, Terrence D. Ruddy, Joanna X. Liang, Valerie Builoff, David Ouyang, Daniel S. Berman, Damini Dey, Piotr J. Slomka","doi":"10.1038/s41746-025-01526-0","DOIUrl":"https://doi.org/10.1038/s41746-025-01526-0","url":null,"abstract":"<p>Low-dose computed tomography attenuation correction (CTAC) scans are used in hybrid myocardial perfusion imaging (MPI) for attenuation correction and coronary calcium scoring, and contain additional anatomic and pathologic information not utilized in clinical assessment. We seek to uncover the full potential of these scans utilizing a holistic artificial intelligence (AI) approach. A multi-structure model segmented 33 structures and quantified 15 radiomics features in each organ in 10,480 patients from 4 sites. Coronary calcium and epicardial fat measures were obtained from separate AI models. The area under the receiver-operating characteristic curves (AUC) for all-cause mortality prediction of the model utilizing MPI, CT, stress test, and clinical features was 0.80 (95% confidence interval [0.74–0.87]), which was higher than for coronary calcium (0.64 [0.57–0.71]) or perfusion (0.62 [0.55–0.70]), with <i>p</i> < 0.001 for both. A comprehensive multimodality approach can significantly improve mortality prediction compared to MPI information alone in patients undergoing hybrid MPI.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"13 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607867","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Global 10 year ecological momentary assessment and mobile sensing study on tinnitus and environmental sounds","authors":"Robin Kraft, Berthold Langguth, Jorge Simoes, Manfred Reichert, Winfried Schlee, Rüdiger Pryss","doi":"10.1038/s41746-025-01551-z","DOIUrl":"https://doi.org/10.1038/s41746-025-01551-z","url":null,"abstract":"<p>In most tinnitus patients, tinnitus can be masked by external sounds. However, evidence for the efficacy of sound-based treatments is scarce. To elucidate the effect of sounds on tinnitus under real-world conditions, we collected data through the TrackYourTinnitus mobile platform over a ten-year period using Ecological Momentary Assessment and Mobile Crowdsensing. Using this dataset, we analyzed 67,442 samples from 572 users. Depending on the effect of environmental sounds on tinnitus, we identified three groups (T-, T+, T0) using Growth Mixture Modeling (GMM). Moreover, we compared these groups with respect to demographic, clinical, and user characteristics. We found that external sound reduces tinnitus (T-) in about 20% of users, increases tinnitus (T+) in about 5%, and leaves tinnitus unaffected (T0) in about 75%. The three groups differed significantly with respect to age and hearing problems, suggesting that the effect of sound on tinnitus is a relevant criterion for clinical subtyping.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"1 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143618605","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yi-han Sheu, Jaak Simm, Bo Wang, Hyunjoon Lee, Jordan W. Smoller
{"title":"Continuous time and dynamic suicide attempt risk prediction with neural ordinary differential equations","authors":"Yi-han Sheu, Jaak Simm, Bo Wang, Hyunjoon Lee, Jordan W. Smoller","doi":"10.1038/s41746-025-01552-y","DOIUrl":"https://doi.org/10.1038/s41746-025-01552-y","url":null,"abstract":"<p>Current clinician-based and automated risk assessment methods treat the risk of suicide-related behaviors (SRBs) as static, while in actual clinical practice, SRB risk fluctuates over time. Here, we develop two closely related model classes, Event-GRU-ODE and Event-GRU-Discretized, that can predict the dynamic risk of events as a continuous trajectory across future time points, even without new observations, while updating these estimates as new data become available. Models were trained and validated for SRB prediction using a large electronic health record database. Both models demonstrated high discrimination (e.g., Event-GRU-ODE AUROC = 0.93, AUPRC = 0.10, relative risk = 13.4 at 95% specificity in a low-prevalence [0.15%] general cohort with a 1.5-year prediction window). This work provides an initial step toward developing novel suicide prevention strategies based on dynamic changes in risk.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"11 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143607868","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ji Woo Hwang, Ga Eun Lee, Jae Hyun Woo, Sung Min Kim, Ji Yean Kwon
{"title":"Systematic review and meta-analysis on fully automated digital cognitive behavioral therapy for insomnia","authors":"Ji Woo Hwang, Ga Eun Lee, Jae Hyun Woo, Sung Min Kim, Ji Yean Kwon","doi":"10.1038/s41746-025-01514-4","DOIUrl":"https://doi.org/10.1038/s41746-025-01514-4","url":null,"abstract":"<p>Insomnia impairs daily functioning and increases health risks. Cognitive behavioral therapy for insomnia (CBT-I) is effective but limited by cost and therapist availability. Fully automated digital CBT-I (FA dCBT-I) provides an accessible alternative without therapist involvement. This systematic review and meta-analysis evaluated the effectiveness of FA dCBT-I across 29 randomized controlled trials (RCTs) involving 9475 participants. Compared to control groups, FA dCBT-I demonstrated moderate to large effects on insomnia severity. Subgroup analyses indicated that FA dCBT-I had a significant impact when contrasted with most control groups but was less effective than therapist-assisted CBT-I. Meta-regression revealed that control group type moderated outcomes, whereas completion rate did not. This implies that treatment adherence, rather than merely completing the intervention, is crucial for its effectiveness. This study supports the potential of FA dCBT-I as a promising option for managing insomnia but underscores that a hybrid model combining therapist support is more beneficial.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"31 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"AI in Histopathology Explorer for comprehensive analysis of the evolving AI landscape in histopathology","authors":"Yingrui Ma, Shivprasad Jamdade, Lakshmi Konduri, Heba Sailem","doi":"10.1038/s41746-025-01524-2","DOIUrl":"https://doi.org/10.1038/s41746-025-01524-2","url":null,"abstract":"<p>Digital pathology and artificial intelligence (AI) hold immense transformative potential to revolutionize cancer diagnostics, treatment outcomes, and biomarker discovery. Gaining a deeper understanding of deep learning algorithm methods applied to histopathological data and evaluating their performance on different tasks is crucial for developing the next generation of AI technologies. To this end, we developed AI in Histopathology Explorer (HistoPathExplorer); an interactive dashboard with intelligent tools available at www.histopathexpo.ai. This real-time online resource enables users, including researchers, decision-makers, and various stakeholders, to assess the current landscape of AI applications for specific clinical tasks, analyze their performance, and explore the factors influencing their translation into practice. Moreover, a quality index was defined for evaluating the comprehensiveness of methodological details in published AI methods. HistoPathExplorer highlights opportunities and challenges for AI in histopathology, and offers a valuable resource for creating more effective methods and shaping strategies and guidelines for translating digital pathology applications into clinical practice.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"67 4 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143599487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fereshteh Hasanzadeh, Colin B. Josephson, Gabriella Waters, Demilade Adedinsewo, Zahra Azizi, James A. White
{"title":"Bias recognition and mitigation strategies in artificial intelligence healthcare applications","authors":"Fereshteh Hasanzadeh, Colin B. Josephson, Gabriella Waters, Demilade Adedinsewo, Zahra Azizi, James A. White","doi":"10.1038/s41746-025-01503-7","DOIUrl":"https://doi.org/10.1038/s41746-025-01503-7","url":null,"abstract":"<p>Artificial intelligence (AI) is delivering value across all aspects of clinical practice. However, bias may exacerbate healthcare disparities. This review examines the origins of bias in healthcare AI, strategies for mitigation, and responsibilities of relevant stakeholders towards achieving fair and equitable use. We highlight the importance of systematically identifying bias and engaging relevant mitigation activities throughout the AI model lifecycle, from model conception through to deployment and longitudinal surveillance.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"2 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-03-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143589626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}