Moein Enayati, Mahsa Khalili, Shrinath Patel, Todd R Huschka, Daniel Cabrera, Sarah J Parker, Kalyan S Pasupathy, Prashant Mahajan, Fernanda Bellolio
{"title":"Incorporating Machine Learning Driven Factors in the Design of Electronic-triggers to Detect Diagnostic Errors in the Emergency Department.","authors":"Moein Enayati, Mahsa Khalili, Shrinath Patel, Todd R Huschka, Daniel Cabrera, Sarah J Parker, Kalyan S Pasupathy, Prashant Mahajan, Fernanda Bellolio","doi":"10.1097/PTS.0000000000001409","DOIUrl":null,"url":null,"abstract":"<p><strong>Objectives: </strong>Electronic health records (EHR)-based triggers (eTriggers) have been used to study diagnostic errors in the emergency department (ED), often with suboptimal performance. Our objective was to investigate incremental value of multi-factor machine learning (ML) approaches to improve eTrigger performance.</p><p><strong>Methods: </strong>Patients presenting to an academic ED were categorized into trigger-positive and trigger-negative using standard trigger (T) definitions: (T1) ED return visits resulting in admission within 10 days; (T2) care escalation from the inpatient unit to the ICU within 24 hours; and (T3) deaths within 24 hours of admission. We trained and evaluated 6 supervised ML models.</p><p><strong>Results: </strong>A total of 124,053 consecutive encounters (5791 T-positive and 118,262 T-negative) were included. Among the T-positive, 4159 (72%) were associated with T1, 1415 (24%) with T2, and 217 (4%) with T3. The T-based positive predictive values (PPV) were 5.2% for T1, 8.2% for T2, and 6.5% for T3. ML models trained and evaluated on balanced training dataset and imbalanced test set had low classification performances (accuracy: 0.72-0.95; PPV: 0.00-0.16; F1-score: 0.00-0.23). Higher performances were observed in balanced test sets (accuracy: 0.80-0.97; PPV: 0.82-1.00; F1-score: 0.79-0.97). Comparing models trained on clinically annotated data with models trained on T-based labels identified other important factors.</p><p><strong>Conclusions: </strong>Utilizing machine learning to refine e-triggers slightly improves the identification of diagnostic errors, as evidenced by an increase in PPV values. We identified new potential factors contributing to ED diagnostic errors. These findings open new avenues to construct or modify more accurate e-triggers for diagnostic error identification.</p>","PeriodicalId":48901,"journal":{"name":"Journal of Patient Safety","volume":" ","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-09-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Patient Safety","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1097/PTS.0000000000001409","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"HEALTH CARE SCIENCES & SERVICES","Score":null,"Total":0}
引用次数: 0
Abstract
Objectives: Electronic health records (EHR)-based triggers (eTriggers) have been used to study diagnostic errors in the emergency department (ED), often with suboptimal performance. Our objective was to investigate incremental value of multi-factor machine learning (ML) approaches to improve eTrigger performance.
Methods: Patients presenting to an academic ED were categorized into trigger-positive and trigger-negative using standard trigger (T) definitions: (T1) ED return visits resulting in admission within 10 days; (T2) care escalation from the inpatient unit to the ICU within 24 hours; and (T3) deaths within 24 hours of admission. We trained and evaluated 6 supervised ML models.
Results: A total of 124,053 consecutive encounters (5791 T-positive and 118,262 T-negative) were included. Among the T-positive, 4159 (72%) were associated with T1, 1415 (24%) with T2, and 217 (4%) with T3. The T-based positive predictive values (PPV) were 5.2% for T1, 8.2% for T2, and 6.5% for T3. ML models trained and evaluated on balanced training dataset and imbalanced test set had low classification performances (accuracy: 0.72-0.95; PPV: 0.00-0.16; F1-score: 0.00-0.23). Higher performances were observed in balanced test sets (accuracy: 0.80-0.97; PPV: 0.82-1.00; F1-score: 0.79-0.97). Comparing models trained on clinically annotated data with models trained on T-based labels identified other important factors.
Conclusions: Utilizing machine learning to refine e-triggers slightly improves the identification of diagnostic errors, as evidenced by an increase in PPV values. We identified new potential factors contributing to ED diagnostic errors. These findings open new avenues to construct or modify more accurate e-triggers for diagnostic error identification.
期刊介绍:
Journal of Patient Safety (ISSN 1549-8417; online ISSN 1549-8425) is dedicated to presenting research advances and field applications in every area of patient safety. While Journal of Patient Safety has a research emphasis, it also publishes articles describing near-miss opportunities, system modifications that are barriers to error, and the impact of regulatory changes on healthcare delivery. This mix of research and real-world findings makes Journal of Patient Safety a valuable resource across the breadth of health professions and from bench to bedside.