Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V. Stolyar, Katelyn Polanska, Karleigh R. McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E. Cacciamani, Cong Sun, Yifan Peng, Yanshan Wang
{"title":"A framework for human evaluation of large language models in healthcare derived from literature review","authors":"Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V. Stolyar, Katelyn Polanska, Karleigh R. McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E. Cacciamani, Cong Sun, Yifan Peng, Yanshan Wang","doi":"10.1038/s41746-024-01258-7","DOIUrl":"10.1038/s41746-024-01258-7","url":null,"abstract":"With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01258-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142328660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko van Treeck, Sonja K. Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn, Jakob Nikolas Kather
{"title":"Privacy-preserving large language models for structured medical information retrieval","authors":"Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko van Treeck, Sonja K. Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn, Jakob Nikolas Kather","doi":"10.1038/s41746-024-01233-2","DOIUrl":"10.1038/s41746-024-01233-2","url":null,"abstract":"Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) “Llama 2” to extract quantitative information from clinical text and evaluates its performance in identifying features of decompensated liver cirrhosis. The LLM identified five key clinical features in a zero- and one-shot manner from 500 patient medical histories in the MIMIC IV dataset. We compared LLMs of three sizes and various prompt engineering approaches, with predictions compared against ground truth from three blinded medical experts. Our pipeline achieved high accuracy, detecting liver cirrhosis with 100% sensitivity and 96% specificity. High sensitivities and specificities were also yielded for detecting ascites (95%, 95%), confusion (76%, 94%), abdominal pain (84%, 97%), and shortness of breath (87%, 97%) using the 70 billion parameter model, which outperformed smaller versions. Our study successfully demonstrates the capability of locally deployed LLMs to extract clinical information from free text with low hardware requirements.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01233-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142275135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pawel Renc, Yugang Jia, Anthony E. Samir, Jaroslaw Was, Quanzheng Li, David W. Bates, Arkadiusz Sitek
{"title":"Zero shot health trajectory prediction using transformer","authors":"Pawel Renc, Yugang Jia, Anthony E. Samir, Jaroslaw Was, Quanzheng Li, David W. Bates, Arkadiusz Sitek","doi":"10.1038/s41746-024-01235-0","DOIUrl":"10.1038/s41746-024-01235-0","url":null,"abstract":"Integrating modern machine learning and clinical decision-making has great promise for mitigating healthcare’s increasing cost and complexity. We introduce the Enhanced Transformer for Health Outcome Simulation (ETHOS), a novel application of the transformer deep-learning architecture for analyzing high-dimensional, heterogeneous, and episodic health data. ETHOS is trained using Patient Health Timelines (PHTs)—detailed, tokenized records of health events—to predict future health trajectories, leveraging a zero-shot learning approach. ETHOS represents a significant advancement in foundation model development for healthcare analytics, eliminating the need for labeled data and model fine-tuning. Its ability to simulate various treatment pathways and consider patient-specific factors positions ETHOS as a tool for care optimization and addressing biases in healthcare delivery. Future developments will expand ETHOS’ capabilities to incorporate a wider range of data types and data sources. Our work demonstrates a pathway toward accelerated AI development and deployment in healthcare.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01235-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142245563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regulatory responses and approval status of artificial intelligence medical devices with a focus on China","authors":"Yuehua Liu, Wenjin Yu, Tharam Dillon","doi":"10.1038/s41746-024-01254-x","DOIUrl":"10.1038/s41746-024-01254-x","url":null,"abstract":"This paper focuses on how regulatory bodies respond to artificial intelligence (AI)-enabled medical devices. To achieve this, we present a comparative overview of the United States (USA), European Union (EU), and China. Our search in the governmental database identified 59 AI medical devices approved in China as of July 2023. In comparison to the rules-based regulatory approach in China, the approaches in the USA and EU are more standards-oriented.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01254-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142245564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Diel, Isabel Carolin Schröter, Anna-Lena Frewer, Christoph Jansen, Anita Robitzsch, Gertraud Gradl-Dietsch, Martin Teufel, Alexander Bäuerle
{"title":"A systematic review and meta analysis on digital mental health interventions in inpatient settings","authors":"Alexander Diel, Isabel Carolin Schröter, Anna-Lena Frewer, Christoph Jansen, Anita Robitzsch, Gertraud Gradl-Dietsch, Martin Teufel, Alexander Bäuerle","doi":"10.1038/s41746-024-01252-z","DOIUrl":"10.1038/s41746-024-01252-z","url":null,"abstract":"E-mental health (EMH) interventions gain increasing importance in the treatment of mental health disorders. Their outpatient efficacy is well-established. However, research on EMH in inpatient settings remains sparse and lacks a meta-analytic synthesis. This paper presents a meta-analysis on the efficacy of EMH in inpatient settings. Searching multiple databases (PubMed, ScienceGov, PsycInfo, CENTRAL, references), 26 randomized controlled trial (RCT) EMH inpatient studies (n = 6112) with low or medium assessed risk of bias were included. A small significant total effect of EMH treatment was found (g = 0.3). The effect was significant both for blended interventions (g = 0.42) and post-treatment EMH-based aftercare (g = 0.29). EMH treatment yielded significant effects across different patient groups and types of therapy, and the effects remained stable post-treatment. The results show the efficacy of EMH treatment in inpatient settings. The meta-analysis is limited by the small number of included studies.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01252-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mila Nambiar, Yong Mong Bee, Yu En Chan, Ivan Ho Mien, Feri Guretno, David Carmody, Phong Ching Lee, Sing Yi Chia, Nur Nasyitah Mohamed Salim, Pavitra Krishnaswamy
{"title":"A drug mix and dose decision algorithm for individualized type 2 diabetes management","authors":"Mila Nambiar, Yong Mong Bee, Yu En Chan, Ivan Ho Mien, Feri Guretno, David Carmody, Phong Ching Lee, Sing Yi Chia, Nur Nasyitah Mohamed Salim, Pavitra Krishnaswamy","doi":"10.1038/s41746-024-01230-5","DOIUrl":"10.1038/s41746-024-01230-5","url":null,"abstract":"Pharmacotherapy guidelines for type 2 diabetes (T2D) emphasize patient-centered care, but applying this approach effectively in outpatient practice remains challenging. Data-driven treatment optimization approaches could enhance individualized T2D management, but current approaches cannot account for drug-specific and dose-dependent variations in safety and efficacy. We developed and evaluated an AI Drug mix and dose Advisor (AIDA) for glycemic management, using electronic medical records from 107,854 T2D patients in the SingHealth Diabetes Registry. Given a patient’s medical profile, AIDA leverages a predict-then-optimize approach to identify the minimal drug mix and dose changes required to optimize glycemic control, subject to clinical knowledge-based guidelines. On unseen data from large internal, external, and temporal validation sets, AIDA recommendations were estimated to improve post-visit glycated hemoglobin (HbA1c) by an average of 0.40–0.68% over standard of care (P < 0.0001). In qualitative evaluations on 60 diverse cases by a panel of three endocrinologists, AIDA recommendations were mostly rated as reasonable and precise. Finally, AIDA’s ability to account for drug-dose specifics offered several advantages over competing methods, including greater consistency with practice preferences and clinical guidelines for practical but effective options, indication-based treatments, and renal dosing. As AIDA provides drug-dose recommendations to improve outcomes for individual T2D patients, it could be used for clinical decision support at point-of-care, especially in resource-limited settings.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01230-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maarten Z. H. Kolk, Diana My Frodi, Joss Langford, Tariq O. Andersen, Peter Karl Jacobsen, Niels Risum, Hanno L. Tan, Jesper Hastrup Svendsen, Reinoud E. Knops, Søren Zöga Diederichsen, Fleur V. Y. Tjong
{"title":"Deep behavioural representation learning reveals risk profiles for malignant ventricular arrhythmias","authors":"Maarten Z. H. Kolk, Diana My Frodi, Joss Langford, Tariq O. Andersen, Peter Karl Jacobsen, Niels Risum, Hanno L. Tan, Jesper Hastrup Svendsen, Reinoud E. Knops, Søren Zöga Diederichsen, Fleur V. Y. Tjong","doi":"10.1038/s41746-024-01247-w","DOIUrl":"10.1038/s41746-024-01247-w","url":null,"abstract":"We aimed to identify and characterise behavioural profiles in patients at high risk of SCD, by using deep representation learning of day-to-day behavioural recordings. We present a pipeline that employed unsupervised clustering on low-dimensional representations of behavioural time-series data learned by a convolutional residual variational neural network (ResNet-VAE). Data from the prospective, observational SafeHeart study conducted at two large tertiary university centers in the Netherlands and Denmark were used. Patients received an implantable cardioverter-defibrillator (ICD) between May 2021 and September 2022 and wore wearable devices using accelerometer technology during 180 consecutive days. A total of 272 patients (mean age of 63.1 ± 10.2 years, 81% male) were eligible with a total sampling of 37,478 days of behavioural data (138 ± 47 days per patient). Deep representation learning identified five distinct behavioural profiles: Cluster A (n = 46) had very low physical activity levels and a disturbed sleep pattern. Cluster B (n = 70) had high activity levels, mainly at light-to-moderate intensity. Cluster C (n = 63) exhibited a high-intensity activity profile. Cluster D (n = 51) showed above-average sleep efficiency. Cluster E (n = 42) had frequent waking episodes and poor sleep. Annual risks of malignant ventricular arrhythmias ranged from 30.4% in Cluster A to 9.8% and 9.5% for Clusters D-E, respectively. Compared to low-risk profiles (D-E), Cluster A demonstrated a three-to-four fold increased risk of malignant ventricular arrhythmias adjusted for clinical covariates (adjusted HR 3.63, 95% CI 1.54–8.53, p < 0.001). These behavioural profiles may guide more personalised approaches to ventricular arrhythmia and SCD prevention.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01247-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142234429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grace C. Nickel, Serena Wang, Jethro C. C. Kwong, Joseph C. Kvedar
{"title":"The case for inclusive co-creation in digital health innovation","authors":"Grace C. Nickel, Serena Wang, Jethro C. C. Kwong, Joseph C. Kvedar","doi":"10.1038/s41746-024-01256-9","DOIUrl":"10.1038/s41746-024-01256-9","url":null,"abstract":"This piece critiques the exclusion of healthcare practitioners (HCPs) from the digital health innovation process. Drawing on “Sync fast and solve things—best practices for responsible digital health” by Landers et al., the editorial argues for the importance of inclusive co-creation, in which clinicians play an active role in developing digital health solutions. It emphasizes that without the meaningful involvement of HCPs, digital health tools risk being clinically irrelevant.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01256-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142234474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Dylan Powell, Fanny Burrows, Geraint Lewis, Stephen Gilbert
{"title":"How might Hospital at Home enable a greener and healthier future?","authors":"Dylan Powell, Fanny Burrows, Geraint Lewis, Stephen Gilbert","doi":"10.1038/s41746-024-01249-8","DOIUrl":"10.1038/s41746-024-01249-8","url":null,"abstract":"Traditional healthcare delivery models face mounting pressure from rising costs, increasing demand, and a growing environmental footprint. Hospital at Home (HaH) has been proposed as a potential solution, offering care at home through in-person, virtual, or hybrid approaches. Despite focus on expanding HaH provision and capacity, research has primarily explored patient care outcomes, patient satisfaction economic costs with a key gap in its environmental impact. By reducing this evidence gap, HaH may be better placed as a positive enabler in delivering healthier planet and population. This article explores the environmental opportunities and challenges associated with HaH compared to traditional hospital care and reinforces the case for further research to comprehensively quantify the environmental impact including any co-benefits. Our aim for this article is to spark conversation, and begin to help prioritise future research and analysis.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01249-8.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235064","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Andrew Quanbeck, Ming-Yuan Chih, Linda Park, Xiang Li, Qiang Xie, Alice Pulvermacher, Samantha Voelker, Rachel Lundwall, Katherine Eby, Bruce Barrett, Randall Brown
{"title":"A randomized trial testing digital medicine support models for mild-to-moderate alcohol use disorder","authors":"Andrew Quanbeck, Ming-Yuan Chih, Linda Park, Xiang Li, Qiang Xie, Alice Pulvermacher, Samantha Voelker, Rachel Lundwall, Katherine Eby, Bruce Barrett, Randall Brown","doi":"10.1038/s41746-024-01241-2","DOIUrl":"10.1038/s41746-024-01241-2","url":null,"abstract":"This paper reports the results of a hybrid effectiveness-implementation randomized trial that systematically varied levels of human oversight required to support the implementation of a digital medicine intervention for persons with mild-to-moderate alcohol use disorder (AUD). Participants were randomly assigned to three groups representing possible digital health support models within a health system: self-monitored use (SM; n = 185), peer-supported use (PS; n = 186), or a clinically integrated model CI; (n = 187). Across all three groups, the percentage of self-reported heavy drinking days dropped from 38.4% at baseline (95% CI [35.8%, 41%]) to 22.5% (19.5%, 25.5%) at 12 months. The clinically integrated group showed significant improvements in mental health and quality of life compared to the self-monitoring group (p = 0.011). However, higher attrition rates in the clinically integrated group warrant consideration in interpreting this result. Results suggest that making a self-guided digital intervention available to patients may be a viable option for health systems looking to promote alcohol risk reduction. This study was prospectively registered at clinicaltrials.gov on 7/03/2019 (NCT04011644).","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":null,"pages":null},"PeriodicalIF":12.4,"publicationDate":"2024-09-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01241-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142231290","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}