Traditional Machine Learning, Deep Learning, and BERT (Large Language Model) Approaches for Predicting Hospitalizations From Nurse Triage Notes: Comparative Evaluation of Resource Management.
Dhavalkumar Patel, Prem Timsina, Larisa Gorenstein, Benjamin S Glicksberg, Ganesh Raut, Satya Narayan Cheetirala, Fabio Santana, Jules Tamegue, Arash Kia, Eyal Zimlichman, Matthew A Levin, Robert Freeman, Eyal Klang
{"title":"Traditional Machine Learning, Deep Learning, and BERT (Large Language Model) Approaches for Predicting Hospitalizations From Nurse Triage Notes: Comparative Evaluation of Resource Management.","authors":"Dhavalkumar Patel, Prem Timsina, Larisa Gorenstein, Benjamin S Glicksberg, Ganesh Raut, Satya Narayan Cheetirala, Fabio Santana, Jules Tamegue, Arash Kia, Eyal Zimlichman, Matthew A Levin, Robert Freeman, Eyal Klang","doi":"10.2196/52190","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Predicting hospitalization from nurse triage notes has the potential to augment care. However, there needs to be careful considerations for which models to choose for this goal. Specifically, health systems will have varying degrees of computational infrastructure available and budget constraints.</p><p><strong>Objective: </strong>To this end, we compared the performance of the deep learning, Bidirectional Encoder Representations from Transformers (BERT)-based model, Bio-Clinical-BERT, with a bag-of-words (BOW) logistic regression (LR) model incorporating term frequency-inverse document frequency (TF-IDF). These choices represent different levels of computational requirements.</p><p><strong>Methods: </strong>A retrospective analysis was conducted using data from 1,391,988 patients who visited emergency departments in the Mount Sinai Health System spanning from 2017 to 2022. The models were trained on 4 hospitals' data and externally validated on a fifth hospital's data.</p><p><strong>Results: </strong>The Bio-Clinical-BERT model achieved higher areas under the receiver operating characteristic curve (0.82, 0.84, and 0.85) compared to the BOW-LR-TF-IDF model (0.81, 0.83, and 0.84) across training sets of 10,000; 100,000; and ~1,000,000 patients, respectively. Notably, both models proved effective at using triage notes for prediction, despite the modest performance gap.</p><p><strong>Conclusions: </strong>Our findings suggest that simpler machine learning models such as BOW-LR-TF-IDF could serve adequately in resource-limited settings. Given the potential implications for patient care and hospital resource management, further exploration of alternative models and techniques is warranted to enhance predictive performance in this critical domain.</p><p><strong>International registered report identifier (irrid): </strong>RR2-10.1101/2023.08.07.23293699.</p>","PeriodicalId":73551,"journal":{"name":"JMIR AI","volume":"3 ","pages":"e52190"},"PeriodicalIF":0.0000,"publicationDate":"2024-08-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11387908/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JMIR AI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.2196/52190","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Predicting hospitalization from nurse triage notes has the potential to augment care. However, there needs to be careful considerations for which models to choose for this goal. Specifically, health systems will have varying degrees of computational infrastructure available and budget constraints.
Objective: To this end, we compared the performance of the deep learning, Bidirectional Encoder Representations from Transformers (BERT)-based model, Bio-Clinical-BERT, with a bag-of-words (BOW) logistic regression (LR) model incorporating term frequency-inverse document frequency (TF-IDF). These choices represent different levels of computational requirements.
Methods: A retrospective analysis was conducted using data from 1,391,988 patients who visited emergency departments in the Mount Sinai Health System spanning from 2017 to 2022. The models were trained on 4 hospitals' data and externally validated on a fifth hospital's data.
Results: The Bio-Clinical-BERT model achieved higher areas under the receiver operating characteristic curve (0.82, 0.84, and 0.85) compared to the BOW-LR-TF-IDF model (0.81, 0.83, and 0.84) across training sets of 10,000; 100,000; and ~1,000,000 patients, respectively. Notably, both models proved effective at using triage notes for prediction, despite the modest performance gap.
Conclusions: Our findings suggest that simpler machine learning models such as BOW-LR-TF-IDF could serve adequately in resource-limited settings. Given the potential implications for patient care and hospital resource management, further exploration of alternative models and techniques is warranted to enhance predictive performance in this critical domain.
International registered report identifier (irrid): RR2-10.1101/2023.08.07.23293699.