Soo Bin Yoon, Jipyeong Lee, Hyung-Chul Lee, Chul-Woo Jung, Hyeonhoon Lee
{"title":"Comparison of NLP machine learning models with human physicians for ASA Physical Status classification","authors":"Soo Bin Yoon, Jipyeong Lee, Hyung-Chul Lee, Chul-Woo Jung, Hyeonhoon Lee","doi":"10.1038/s41746-024-01259-6","DOIUrl":"10.1038/s41746-024-01259-6","url":null,"abstract":"The American Society of Anesthesiologist’s Physical Status (ASA-PS) classification system assesses comorbidities before sedation and analgesia, but inconsistencies among raters have hindered its objective use. This study aimed to develop natural language processing (NLP) models to classify ASA-PS using pre-anesthesia evaluation summaries, comparing their performance to human physicians. Data from 717,389 surgical cases in a tertiary hospital (October 2004–May 2023) was split into training, tuning, and test datasets. Board-certified anesthesiologists created reference labels for tuning and test datasets. The NLP models, including ClinicalBigBird, BioClinicalBERT, and Generative Pretrained Transformer 4, were validated against anesthesiologists. The ClinicalBigBird model achieved an area under the receiver operating characteristic curve of 0.915. It outperformed board-certified anesthesiologists with a specificity of 0.901 vs. 0.897, precision of 0.732 vs. 0.715, and F1-score of 0.716 vs. 0.713 (all p <0.01). This approach will facilitate automatic and objective ASA-PS classification, thereby streamlining the clinical workflow.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-11"},"PeriodicalIF":12.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01259-6.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142328616","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Prakash Adekkanattu, Al’ona Furmanchuk, Yonghui Wu, Aman Pathak, Braja Gopal Patra, Sarah Bost, Destinee Morrow, Grace Hsin-Min Wang, Yuyang Yang, Noah James Forrest, Yuan Luo, Theresa L. Walunas, Weihsuan Lo-Ciganic, Walid Gelad, Jiang Bian, Yuhua Bao, Mark Weiner, David Oslin, Jyotishman Pathak
{"title":"Deep learning for identifying personal and family history of suicidal thoughts and behaviors from EHRs","authors":"Prakash Adekkanattu, Al’ona Furmanchuk, Yonghui Wu, Aman Pathak, Braja Gopal Patra, Sarah Bost, Destinee Morrow, Grace Hsin-Min Wang, Yuyang Yang, Noah James Forrest, Yuan Luo, Theresa L. Walunas, Weihsuan Lo-Ciganic, Walid Gelad, Jiang Bian, Yuhua Bao, Mark Weiner, David Oslin, Jyotishman Pathak","doi":"10.1038/s41746-024-01266-7","DOIUrl":"10.1038/s41746-024-01266-7","url":null,"abstract":"Personal and family history of suicidal thoughts and behaviors (PSH and FSH, respectively) are significant risk factors associated with suicides. Research is limited in automatic identification of such data from clinical notes in Electronic Health Records. This study developed deep learning (DL) tools utilizing transformer models (Bio_ClinicalBERT and GatorTron) to detect PSH and FSH in clinical notes derived from three academic medical centers, and compared their performance with a rule-based natural language processing tool. For detecting PSH, the rule-based approach obtained an F1-score of 0.75 ± 0.07, while the Bio_ClinicalBERT and GatorTron DL tools scored 0.83 ± 0.09 and 0.84 ± 0.07, respectively. For detecting FSH, the rule-based approach achieved an F1-score of 0.69 ± 0.11, compared to 0.89 ± 0.10 for Bio_ClinicalBERT and 0.92 ± 0.07 for GatorTron. Across sites, the DL tools identified more than 80% of patients at elevated risk for suicide who remain undiagnosed and untreated.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-9"},"PeriodicalIF":12.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01266-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142329190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V. Stolyar, Katelyn Polanska, Karleigh R. McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E. Cacciamani, Cong Sun, Yifan Peng, Yanshan Wang
{"title":"A framework for human evaluation of large language models in healthcare derived from literature review","authors":"Thomas Yu Chow Tam, Sonish Sivarajkumar, Sumit Kapoor, Alisa V. Stolyar, Katelyn Polanska, Karleigh R. McCarthy, Hunter Osterhoudt, Xizhi Wu, Shyam Visweswaran, Sunyang Fu, Piyush Mathur, Giovanni E. Cacciamani, Cong Sun, Yifan Peng, Yanshan Wang","doi":"10.1038/s41746-024-01258-7","DOIUrl":"10.1038/s41746-024-01258-7","url":null,"abstract":"With generative artificial intelligence (GenAI), particularly large language models (LLMs), continuing to make inroads in healthcare, assessing LLMs with human evaluations is essential to assuring safety and effectiveness. This study reviews existing literature on human evaluation methodologies for LLMs in healthcare across various medical specialties and addresses factors such as evaluation dimensions, sample types and sizes, selection, and recruitment of evaluators, frameworks and metrics, evaluation process, and statistical analysis type. Our literature review of 142 studies shows gaps in reliability, generalizability, and applicability of current human evaluation practices. To overcome such significant obstacles to healthcare LLM developments and deployments, we propose QUEST, a comprehensive and practical framework for human evaluation of LLMs covering three phases of workflow: Planning, Implementation and Adjudication, and Scoring and Review. QUEST is designed with five proposed evaluation principles: Quality of Information, Understanding and Reasoning, Expression Style and Persona, Safety and Harm, and Trust and Confidence.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-20"},"PeriodicalIF":12.4,"publicationDate":"2024-09-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01258-7.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142328660","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko van Treeck, Sonja K. Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn, Jakob Nikolas Kather
{"title":"Privacy-preserving large language models for structured medical information retrieval","authors":"Isabella Catharina Wiest, Dyke Ferber, Jiefu Zhu, Marko van Treeck, Sonja K. Meyer, Radhika Juglan, Zunamys I. Carrero, Daniel Paech, Jens Kleesiek, Matthias P. Ebert, Daniel Truhn, Jakob Nikolas Kather","doi":"10.1038/s41746-024-01233-2","DOIUrl":"10.1038/s41746-024-01233-2","url":null,"abstract":"Most clinical information is encoded as free text, not accessible for quantitative analysis. This study presents an open-source pipeline using the local large language model (LLM) “Llama 2” to extract quantitative information from clinical text and evaluates its performance in identifying features of decompensated liver cirrhosis. The LLM identified five key clinical features in a zero- and one-shot manner from 500 patient medical histories in the MIMIC IV dataset. We compared LLMs of three sizes and various prompt engineering approaches, with predictions compared against ground truth from three blinded medical experts. Our pipeline achieved high accuracy, detecting liver cirrhosis with 100% sensitivity and 96% specificity. High sensitivities and specificities were also yielded for detecting ascites (95%, 95%), confusion (76%, 94%), abdominal pain (84%, 97%), and shortness of breath (87%, 97%) using the 70 billion parameter model, which outperformed smaller versions. Our study successfully demonstrates the capability of locally deployed LLMs to extract clinical information from free text with low hardware requirements.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-9"},"PeriodicalIF":12.4,"publicationDate":"2024-09-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01233-2.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142275135","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pawel Renc, Yugang Jia, Anthony E. Samir, Jaroslaw Was, Quanzheng Li, David W. Bates, Arkadiusz Sitek
{"title":"Zero shot health trajectory prediction using transformer","authors":"Pawel Renc, Yugang Jia, Anthony E. Samir, Jaroslaw Was, Quanzheng Li, David W. Bates, Arkadiusz Sitek","doi":"10.1038/s41746-024-01235-0","DOIUrl":"10.1038/s41746-024-01235-0","url":null,"abstract":"Integrating modern machine learning and clinical decision-making has great promise for mitigating healthcare’s increasing cost and complexity. We introduce the Enhanced Transformer for Health Outcome Simulation (ETHOS), a novel application of the transformer deep-learning architecture for analyzing high-dimensional, heterogeneous, and episodic health data. ETHOS is trained using Patient Health Timelines (PHTs)—detailed, tokenized records of health events—to predict future health trajectories, leveraging a zero-shot learning approach. ETHOS represents a significant advancement in foundation model development for healthcare analytics, eliminating the need for labeled data and model fine-tuning. Its ability to simulate various treatment pathways and consider patient-specific factors positions ETHOS as a tool for care optimization and addressing biases in healthcare delivery. Future developments will expand ETHOS’ capabilities to incorporate a wider range of data types and data sources. Our work demonstrates a pathway toward accelerated AI development and deployment in healthcare.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-10"},"PeriodicalIF":12.4,"publicationDate":"2024-09-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01235-0.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142245563","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Regulatory responses and approval status of artificial intelligence medical devices with a focus on China","authors":"Yuehua Liu, Wenjin Yu, Tharam Dillon","doi":"10.1038/s41746-024-01254-x","DOIUrl":"10.1038/s41746-024-01254-x","url":null,"abstract":"This paper focuses on how regulatory bodies respond to artificial intelligence (AI)-enabled medical devices. To achieve this, we present a comparative overview of the United States (USA), European Union (EU), and China. Our search in the governmental database identified 59 AI medical devices approved in China as of July 2023. In comparison to the rules-based regulatory approach in China, the approaches in the USA and EU are more standards-oriented.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-11"},"PeriodicalIF":12.4,"publicationDate":"2024-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01254-x.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142245564","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alexander Diel, Isabel Carolin Schröter, Anna-Lena Frewer, Christoph Jansen, Anita Robitzsch, Gertraud Gradl-Dietsch, Martin Teufel, Alexander Bäuerle
{"title":"A systematic review and meta analysis on digital mental health interventions in inpatient settings","authors":"Alexander Diel, Isabel Carolin Schröter, Anna-Lena Frewer, Christoph Jansen, Anita Robitzsch, Gertraud Gradl-Dietsch, Martin Teufel, Alexander Bäuerle","doi":"10.1038/s41746-024-01252-z","DOIUrl":"10.1038/s41746-024-01252-z","url":null,"abstract":"E-mental health (EMH) interventions gain increasing importance in the treatment of mental health disorders. Their outpatient efficacy is well-established. However, research on EMH in inpatient settings remains sparse and lacks a meta-analytic synthesis. This paper presents a meta-analysis on the efficacy of EMH in inpatient settings. Searching multiple databases (PubMed, ScienceGov, PsycInfo, CENTRAL, references), 26 randomized controlled trial (RCT) EMH inpatient studies (n = 6112) with low or medium assessed risk of bias were included. A small significant total effect of EMH treatment was found (g = 0.3). The effect was significant both for blended interventions (g = 0.42) and post-treatment EMH-based aftercare (g = 0.29). EMH treatment yielded significant effects across different patient groups and types of therapy, and the effects remained stable post-treatment. The results show the efficacy of EMH treatment in inpatient settings. The meta-analysis is limited by the small number of included studies.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-9"},"PeriodicalIF":12.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01252-z.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235063","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mila Nambiar, Yong Mong Bee, Yu En Chan, Ivan Ho Mien, Feri Guretno, David Carmody, Phong Ching Lee, Sing Yi Chia, Nur Nasyitah Mohamed Salim, Pavitra Krishnaswamy
{"title":"A drug mix and dose decision algorithm for individualized type 2 diabetes management","authors":"Mila Nambiar, Yong Mong Bee, Yu En Chan, Ivan Ho Mien, Feri Guretno, David Carmody, Phong Ching Lee, Sing Yi Chia, Nur Nasyitah Mohamed Salim, Pavitra Krishnaswamy","doi":"10.1038/s41746-024-01230-5","DOIUrl":"10.1038/s41746-024-01230-5","url":null,"abstract":"Pharmacotherapy guidelines for type 2 diabetes (T2D) emphasize patient-centered care, but applying this approach effectively in outpatient practice remains challenging. Data-driven treatment optimization approaches could enhance individualized T2D management, but current approaches cannot account for drug-specific and dose-dependent variations in safety and efficacy. We developed and evaluated an AI Drug mix and dose Advisor (AIDA) for glycemic management, using electronic medical records from 107,854 T2D patients in the SingHealth Diabetes Registry. Given a patient’s medical profile, AIDA leverages a predict-then-optimize approach to identify the minimal drug mix and dose changes required to optimize glycemic control, subject to clinical knowledge-based guidelines. On unseen data from large internal, external, and temporal validation sets, AIDA recommendations were estimated to improve post-visit glycated hemoglobin (HbA1c) by an average of 0.40–0.68% over standard of care (P < 0.0001). In qualitative evaluations on 60 diverse cases by a panel of three endocrinologists, AIDA recommendations were mostly rated as reasonable and precise. Finally, AIDA’s ability to account for drug-dose specifics offered several advantages over competing methods, including greater consistency with practice preferences and clinical guidelines for practical but effective options, indication-based treatments, and renal dosing. As AIDA provides drug-dose recommendations to improve outcomes for individual T2D patients, it could be used for clinical decision support at point-of-care, especially in resource-limited settings.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-12"},"PeriodicalIF":12.4,"publicationDate":"2024-09-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01230-5.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142235102","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maarten Z. H. Kolk, Diana My Frodi, Joss Langford, Tariq O. Andersen, Peter Karl Jacobsen, Niels Risum, Hanno L. Tan, Jesper Hastrup Svendsen, Reinoud E. Knops, Søren Zöga Diederichsen, Fleur V. Y. Tjong
{"title":"Deep behavioural representation learning reveals risk profiles for malignant ventricular arrhythmias","authors":"Maarten Z. H. Kolk, Diana My Frodi, Joss Langford, Tariq O. Andersen, Peter Karl Jacobsen, Niels Risum, Hanno L. Tan, Jesper Hastrup Svendsen, Reinoud E. Knops, Søren Zöga Diederichsen, Fleur V. Y. Tjong","doi":"10.1038/s41746-024-01247-w","DOIUrl":"10.1038/s41746-024-01247-w","url":null,"abstract":"We aimed to identify and characterise behavioural profiles in patients at high risk of SCD, by using deep representation learning of day-to-day behavioural recordings. We present a pipeline that employed unsupervised clustering on low-dimensional representations of behavioural time-series data learned by a convolutional residual variational neural network (ResNet-VAE). Data from the prospective, observational SafeHeart study conducted at two large tertiary university centers in the Netherlands and Denmark were used. Patients received an implantable cardioverter-defibrillator (ICD) between May 2021 and September 2022 and wore wearable devices using accelerometer technology during 180 consecutive days. A total of 272 patients (mean age of 63.1 ± 10.2 years, 81% male) were eligible with a total sampling of 37,478 days of behavioural data (138 ± 47 days per patient). Deep representation learning identified five distinct behavioural profiles: Cluster A (n = 46) had very low physical activity levels and a disturbed sleep pattern. Cluster B (n = 70) had high activity levels, mainly at light-to-moderate intensity. Cluster C (n = 63) exhibited a high-intensity activity profile. Cluster D (n = 51) showed above-average sleep efficiency. Cluster E (n = 42) had frequent waking episodes and poor sleep. Annual risks of malignant ventricular arrhythmias ranged from 30.4% in Cluster A to 9.8% and 9.5% for Clusters D-E, respectively. Compared to low-risk profiles (D-E), Cluster A demonstrated a three-to-four fold increased risk of malignant ventricular arrhythmias adjusted for clinical covariates (adjusted HR 3.63, 95% CI 1.54–8.53, p < 0.001). These behavioural profiles may guide more personalised approaches to ventricular arrhythmia and SCD prevention.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-10"},"PeriodicalIF":12.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01247-w.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142234429","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Grace C. Nickel, Serena Wang, Jethro C. C. Kwong, Joseph C. Kvedar
{"title":"The case for inclusive co-creation in digital health innovation","authors":"Grace C. Nickel, Serena Wang, Jethro C. C. Kwong, Joseph C. Kvedar","doi":"10.1038/s41746-024-01256-9","DOIUrl":"10.1038/s41746-024-01256-9","url":null,"abstract":"This piece critiques the exclusion of healthcare practitioners (HCPs) from the digital health innovation process. Drawing on “Sync fast and solve things—best practices for responsible digital health” by Landers et al., the editorial argues for the importance of inclusive co-creation, in which clinicians play an active role in developing digital health solutions. It emphasizes that without the meaningful involvement of HCPs, digital health tools risk being clinically irrelevant.","PeriodicalId":19349,"journal":{"name":"NPJ Digital Medicine","volume":" ","pages":"1-2"},"PeriodicalIF":12.4,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.nature.com/articles/s41746-024-01256-9.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142234474","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}