{"title":"Rethinking clinical trials for medical AI with dynamic deployments of adaptive systems.","authors":"Jacob T Rosenthal,Ashley Beecy,Mert R Sabuncu","doi":"10.1038/s41746-025-01674-3","DOIUrl":"https://doi.org/10.1038/s41746-025-01674-3","url":null,"abstract":"There is a growing recognition of the need for clinical trials to safely and effectively deploy artificial intelligence (AI) in clinical settings. We introduce dynamic deployment as a framework for AI clinical trials tailored for the dynamic nature of large language models, making possible complex medical AI systems which continuously learn and adapt in situ from new data and interactions with users while enabling continuous real-time monitoring and clinical validation.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"145 1","pages":"252"},"PeriodicalIF":15.2,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143915040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing biomarker based oncology trial matching using large language models","authors":"Nour Alkhoury, Maqsood Shaik, Ricardo Wurmus, Altuna Akalin","doi":"10.1038/s41746-025-01673-4","DOIUrl":"https://doi.org/10.1038/s41746-025-01673-4","url":null,"abstract":"<p>Clinical trials are an essential component of drug development for new cancer treatments, yet the information required to determine a patient’s eligibility for enrollment is scattered in large amounts of unstructured text. Genomic biomarkers are especially important in precision medicine and targeted therapies, making them essential for matching patients to appropriate trials. Large language models (LLMs) offer a promising solution for extracting this information from clinical trial study descriptions (e.g., brief summary, eligibility criteria), aiding in identifying suitable patient matches in downstream applications. In this study, we explore various strategies for extracting genetic biomarkers from oncology trials. Therefore, our focus is on structuring unstructured clinical trial data, not processing individual patient records. Our results show that open-source language models, when applied out-of-the-box, effectively capture complex logical expressions and structure genomic biomarkers, outperforming closed-source models such as GPT-4. Furthermore, fine-tuning these open-source models with additional data significantly enhances their performance.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"23 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143909873","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Boris P Kovatchev,Patricio Colmegna,Jacopo Pavan,Jenny L Diaz Castañeda,Maria F Villa-Tamayo,Chaitanya L K Koravi,Giulio Santini,Carlene Alix,Meaghan Stumpf,Sue A Brown
{"title":"Human-machine co-adaptation to automated insulin delivery: a randomised clinical trial using digital twin technology.","authors":"Boris P Kovatchev,Patricio Colmegna,Jacopo Pavan,Jenny L Diaz Castañeda,Maria F Villa-Tamayo,Chaitanya L K Koravi,Giulio Santini,Carlene Alix,Meaghan Stumpf,Sue A Brown","doi":"10.1038/s41746-025-01679-y","DOIUrl":"https://doi.org/10.1038/s41746-025-01679-y","url":null,"abstract":"Most automated insulin delivery (AID) algorithms do not adapt to the changing physiology of their users, and none provide interactive means for user adaptation to the actions of AID. This randomised clinical trial tested human-machine co-adaptation to AID using new 'digital twin' replay simulation technology. Seventy-two individuals with T1D completed the 6-month study. The two study arms differed by the order of administration of information feedback (widely used metrics and graphs) and in silico co-adaptation routine, which: (i) transmitted AID data to a cloud application; (ii) mapped each person to their digital twin; (iii) optimized AID control parameters bi-weekly, and (iv) enabled users to experiment with what-if scenarios replayed via their own digital twins. In silico co-adaptation improved the primary outcome, time-in-range (3.9-10 mmol/L), from 72 to 77 percent (p < 0.01) and reduced glycated haemoglobin from 6.8 to 6.6 percent. Information feedback did not have additional effect to AID alone. (Clinical Trials Registration: NCT05610111 (November 10, 2022)).","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"35 1","pages":"253"},"PeriodicalIF":15.2,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143915039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ben Li, Elizabeth J. Enichen, Kimia Heydari, Joseph C. Kvedar
{"title":"Artificial intelligence guided imaging as a tool to fill gaps in health care delivery","authors":"Ben Li, Elizabeth J. Enichen, Kimia Heydari, Joseph C. Kvedar","doi":"10.1038/s41746-025-01613-2","DOIUrl":"https://doi.org/10.1038/s41746-025-01613-2","url":null,"abstract":"Deep vein thrombosis (DVT) causes significant morbidity/mortality and timely diagnosis often via ultrasound is critical. However, the shortage of trained ultrasound providers has been an ongoing challenge. Recently, Speranza and colleagues (2025) demonstrated that an artificial intelligence (AI) guided ultrasound system used by non-ultrasound-trained nurses with remote clinician review can achieve sensitivities of 90–98% and specificities of 74–100% for diagnosing DVT. This study highlights the potential for AI guided imaging to address important gaps in health care delivery.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"24 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143910297","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Giuseppe Biondi-Zoccai, Arjun Mahajan, Dylan Powell, Mariangela Peruzzi, Roberto Carnevale, Giacomo Frati
{"title":"Advancing cardiovascular care through actionable AI innovation","authors":"Giuseppe Biondi-Zoccai, Arjun Mahajan, Dylan Powell, Mariangela Peruzzi, Roberto Carnevale, Giacomo Frati","doi":"10.1038/s41746-025-01621-2","DOIUrl":"https://doi.org/10.1038/s41746-025-01621-2","url":null,"abstract":"Despite significant advances, the prevention and management of cardiovascular disease remain challenging, especially for ischemic heart disease (IHD). Current clinical decision-making relies heavily on physician expertise, guideline-directed therapies, and static risk scores, which often inadequately accommodate individual patient complexity. Machine learning (ML) and artificial intelligence (AI), particularly reinforcement learning (RL), may augment current physician-driven approaches and provide enhanced cardiovascular disease prevention and management. Indeed, offline RL refers to a class of ML algorithms that learn optimal decision-making policies from a fixed dataset of previously collected experiences—such as electronic health records or registries—without the need for active, real-time interaction with the clinical environment. This approach enables the safe development of treatment strategies in high-stakes domains where experimentation on live patients could be unethical or impractical. Notably, offline RL models hold the promise of optimizing decision-making in complex clinical settings, such as revascularization strategies for coronary artery disease. However, challenges remain in integrating AI into practice, ensuring interpretability, maintaining performance, and proving cost-effectiveness. Ultimately, validation, integration, and collaboration among clinicians, researchers, and policymakers are crucial for transforming AI-driven solutions into practical, patient-centered cardiovascular care improvements, pending prospective (and hopefully randomized) validation.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"25 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-05-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143909879","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Yuelyu Ji, Wenhe Ma, Sonish Sivarajkumar, Hang Zhang, Eugene M Sadhu, Zhuochun Li, Xizhi Wu, Shyam Visweswaran, Yanshan Wang
{"title":"Mitigating the risk of health inequity exacerbated by large language models","authors":"Yuelyu Ji, Wenhe Ma, Sonish Sivarajkumar, Hang Zhang, Eugene M Sadhu, Zhuochun Li, Xizhi Wu, Shyam Visweswaran, Yanshan Wang","doi":"10.1038/s41746-025-01576-4","DOIUrl":"https://doi.org/10.1038/s41746-025-01576-4","url":null,"abstract":"<p>Recent advancements in large language models (LLMs) have demonstrated their potential in numerous medical applications, particularly in automating clinical trial matching for translational research and enhancing medical question-answering for clinical decision support. However, our study shows that incorporating non-decisive socio-demographic factors, such as race, sex, income level, LGBT+ status, homelessness, illiteracy, disability, and unemployment, into the input of LLMs can lead to incorrect and harmful outputs. These discrepancies could worsen existing health disparities if LLMs are broadly implemented in healthcare. To address this issue, we introduce EquityGuard, a novel framework designed to detect and mitigate the risk of health inequities in LLM-based medical applications. Our evaluation demonstrates its effectiveness in promoting equitable outcomes across diverse populations.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"36 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903128","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Arya S. Rao, John Kim, Andrew Mu, Cameron C. Young, Ezra Kalmowitz, Michael Senter-Zapata, David C. Whitehead, Lilit Garibyan, Adam B. Landman, Marc D. Succi
{"title":"Synthetic medical education in dermatology leveraging generative artificial intelligence","authors":"Arya S. Rao, John Kim, Andrew Mu, Cameron C. Young, Ezra Kalmowitz, Michael Senter-Zapata, David C. Whitehead, Lilit Garibyan, Adam B. Landman, Marc D. Succi","doi":"10.1038/s41746-025-01650-x","DOIUrl":"https://doi.org/10.1038/s41746-025-01650-x","url":null,"abstract":"<p>The advent of large language models (LLMs) represents an enormous opportunity to revolutionize medical education. Via “synthetic education,” LLMs can be harnessed to generate novel content for medical education purposes, offering potentially unlimited resources for physicians in training. Utilizing OpenAI’s GPT-4, we generated clinical vignettes and accompanying explanations for 20 skin and soft tissue diseases tested on the United States Medical Licensing Examination. Physician experts gave the vignettes high average scores on a Likert scale in scientific accuracy (4.45/5), comprehensiveness (4.3/5), and overall quality (4.28/5) and low scores for potential clinical harm (1.6/5) and demographic bias (1.52/5). A strong correlation (<i>r</i> = 0.83) was observed between comprehensiveness and overall quality. Vignettes did not incorporate significant demographic diversity. This study underscores the potential of LLMs in enhancing the scalability, accessibility, and customizability of dermatology education materials. Efforts to increase vignettes’ demographic diversity should be incorporated to increase applicability to diverse populations.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"92 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-05-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multimodal GPT model for assisting thyroid nodule diagnosis and management","authors":"Jincao Yao, Yunpeng Wang, Zhikai Lei, Kai Wang, Na Feng, Fajin Dong, Jianhua Zhou, Xiaoxian Li, Xiang Hao, Jiafei Shen, Shanshan Zhao, Yuan Gao, Vicky Wang, Di Ou, Wei Li, Yidan Lu, Liyu Chen, Chen Yang, Liping Wang, Bojian Feng, Yahan Zhou, Chen Chen, Yuqi Yan, Zhengping Wang, Rongrong Ru, Yaqing Chen, Yanming Zhang, Ping Liang, Dong Xu","doi":"10.1038/s41746-025-01652-9","DOIUrl":"https://doi.org/10.1038/s41746-025-01652-9","url":null,"abstract":"<p>Although using artificial intelligence (AI) to analyze ultrasound images is a promising approach to assessing thyroid nodule risks, traditional AI models lack transparency and interpretability. We developed a multimodal generative pre-trained transformer for thyroid nodules (ThyGPT), aiming to provide a transparent and interpretable AI copilot model for thyroid nodule risk assessment and management. Ultrasound data from 59,406 patients across nine hospitals were retrospectively collected to train and test the model. After training, ThyGPT was found to assist in reducing biopsy rates by more than 40% without increasing missed diagnoses. In addition, it detects errors in ultrasound reports 1,610 times faster than humans. With the assistance of ThyGPT, the area under the curve for radiologists in assessing thyroid nodule risks improved from 0.805 to 0.908 (<i>p</i> < 0.001). As an AI-generated content-enhanced computer-aided diagnosis (AIGC-CAD) model, ThyGPT has the potential to revolutionize how radiologists use such tools.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"2 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143903129","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mauro Giuffrè, Kisung You, Ziteng Pang, Simone Kresevic, Sunny Chung, Ryan Chen, Youngmin Ko, Colleen Chan, Theo Saarinen, Milos Ajcevic, Lory S. Crocè, Guadalupe Garcia-Tsao, Ian Gralnek, Joseph J. Y. Sung, Alan Barkun, Loren Laine, Jasjeet Sekhon, Bradly Stadie, Dennis L. Shung
{"title":"Expert of Experts Verification and Alignment (EVAL) Framework for Large Language Models Safety in Gastroenterology","authors":"Mauro Giuffrè, Kisung You, Ziteng Pang, Simone Kresevic, Sunny Chung, Ryan Chen, Youngmin Ko, Colleen Chan, Theo Saarinen, Milos Ajcevic, Lory S. Crocè, Guadalupe Garcia-Tsao, Ian Gralnek, Joseph J. Y. Sung, Alan Barkun, Loren Laine, Jasjeet Sekhon, Bradly Stadie, Dennis L. Shung","doi":"10.1038/s41746-025-01589-z","DOIUrl":"https://doi.org/10.1038/s41746-025-01589-z","url":null,"abstract":"<p>Large language models generate plausible text responses to medical questions, but inaccurate responses pose significant risks in medical decision-making. Grading LLM outputs to determine the best model or answer is time-consuming and impractical in clinical settings; therefore, we introduce EVAL (Expert-of-Experts Verification and Alignment) to streamline this process and enhance LLM safety for upper gastrointestinal bleeding (UGIB). We evaluated OpenAI’s GPT-3.5/4/4o/o1-preview, Anthropic’s Claude-3-Opus, Meta’s LLaMA-2 (7B/13B/70B), and Mistral AI’s Mixtral (7B) across 27 configurations, including zero-shot baseline, retrieval-augmented generation, and supervised fine-tuning. EVAL uses similarity-based ranking and a reward model trained on human-graded responses for rejection sampling. Among the employed similarity metrics, Fine-Tuned ColBERT achieved the highest alignment with human performance across three separate datasets (<i>ρ</i> = 0.81–0.91). The reward model replicated human grading with 87.9% of cases across temperature settings and significantly improved accuracy through rejection sampling by 8.36% overall. EVAL offers scalable potential to assess accuracy for high-stakes medical decision-making.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"97 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143901351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junbok Lee, Jung Hyun Kim, Mingee Choi, Jaeyong Shin
{"title":"A choice based conjoint analysis of mobile healthcare application preferences among physicians, patients, and individuals","authors":"Junbok Lee, Jung Hyun Kim, Mingee Choi, Jaeyong Shin","doi":"10.1038/s41746-025-01610-5","DOIUrl":"https://doi.org/10.1038/s41746-025-01610-5","url":null,"abstract":"<p>The rapid proliferation of healthcare service applications (apps) makes it challenging for consumers to determine the best one for their needs, prompting the Korean government to introduce an accreditation program to verify app safety. This study aims to identify the factors influencing the choice of healthcare service apps among physicians, patients with chronic diseases, and healthy individuals. We conducted a choice-based conjoint analysis with six factors (number of studies on effectiveness, frequency of information delivery, cybersecurity and data safety, user satisfaction, accreditation, and costs). 1,093 participants (407 healthy individuals, 589 patients, and 97 physicians) participated in the online survey. Across all groups, cybersecurity and data safety were the most important preference factors (healthy individuals: <i>β</i> = 2.127, 95% CI 2.096–2.338, patients: <i>β</i> = 1.569, 95% CI 1.481–1.658, physicians: <i>β</i> = 1.111, 95% CI 0.908–1.314). All groups were willing to pay more approximately $12 for high cybersecurity and data safety compared to low.</p>","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":"15 1","pages":""},"PeriodicalIF":15.2,"publicationDate":"2025-05-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143901352","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}