Arjun Chakraborty, Kevin Lybarger, Jorge A Olivas Estebane, Judy Y Chen, Mahul Patel, Vikas O'Reilly-Shah, Peter Tarczy-Hornoch, Meliha Yetisgen, Dustin R Long
{"title":"Development and Evaluation of Machine Learning Models for the Identification of Surgical Site Infection in Electronic Health Records.","authors":"Arjun Chakraborty, Kevin Lybarger, Jorge A Olivas Estebane, Judy Y Chen, Mahul Patel, Vikas O'Reilly-Shah, Peter Tarczy-Hornoch, Meliha Yetisgen, Dustin R Long","doi":"10.1089/sur.2024.266","DOIUrl":null,"url":null,"abstract":"<p><p><b><i>Background:</i></b> Surgical site infection (SSI) affects 160,000-300,000 patients per year in the United States, adversely impacting a wide range of patient- and health-system outcomes. Surveillance programs for SSI are essential to quality improvement and public health systems. However, the scope of SSI surveillance is currently limited by the resource-intensive nature of these activities, which are largely based on manual chart review. Recent advances in natural language processing and machine learning could potentially augment the scope and quality of routine SSI surveillance. <b><i>Patients and Methods:</i></b> Electronic health records (EHRs) for 28,864 surgical procedures (representing 25% of all surgical cases) linked to either National Healthcare Safety Network (NHSN) data from Harborview Medical Center or National Surgical Quality Improvement Program (NSQIP) data from the University of Washington Montlake Medical Center were included. Cases comprised five different surgical procedure types performed between 2010 and 2020 (general surgery, gynecological surgery, spine surgery, non-spine orthopedic surgery, and non-spine neurological surgery). Using all clinical notes and structured data elements, we trained random forest and neural network models to identify SSI cases. We conducted experiments to evaluate the impact of clinical notes on the task of retrospective SSI identification and to study domain adaptation across different procedure types and registries. <b><i>Results:</i></b> The best performing model utilized a neural network with input derived from both structured data and unstructured text notes, trained on all surgery types (F1 score: NHSN 0.77, NSQIP 0.58; area under the receiver operating characteristic curve: NHSN 0.98, NSQIP 0.92; recall: NHSN 0.85, NSQIP 0.61). Jointly training one model on all domains (both registries, all surgery types) yielded better performance than training procedure- or registry-specific models. <b><i>Conclusion:</i></b> Automated systems for retrospective identification of SSI in EHRs have the potential to improve the efficiency and reliability of chart reviews for national surveillance and quality improvement programs.</p>","PeriodicalId":22109,"journal":{"name":"Surgical infections","volume":" ","pages":""},"PeriodicalIF":1.4000,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical infections","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/sur.2024.266","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Surgical site infection (SSI) affects 160,000-300,000 patients per year in the United States, adversely impacting a wide range of patient- and health-system outcomes. Surveillance programs for SSI are essential to quality improvement and public health systems. However, the scope of SSI surveillance is currently limited by the resource-intensive nature of these activities, which are largely based on manual chart review. Recent advances in natural language processing and machine learning could potentially augment the scope and quality of routine SSI surveillance. Patients and Methods: Electronic health records (EHRs) for 28,864 surgical procedures (representing 25% of all surgical cases) linked to either National Healthcare Safety Network (NHSN) data from Harborview Medical Center or National Surgical Quality Improvement Program (NSQIP) data from the University of Washington Montlake Medical Center were included. Cases comprised five different surgical procedure types performed between 2010 and 2020 (general surgery, gynecological surgery, spine surgery, non-spine orthopedic surgery, and non-spine neurological surgery). Using all clinical notes and structured data elements, we trained random forest and neural network models to identify SSI cases. We conducted experiments to evaluate the impact of clinical notes on the task of retrospective SSI identification and to study domain adaptation across different procedure types and registries. Results: The best performing model utilized a neural network with input derived from both structured data and unstructured text notes, trained on all surgery types (F1 score: NHSN 0.77, NSQIP 0.58; area under the receiver operating characteristic curve: NHSN 0.98, NSQIP 0.92; recall: NHSN 0.85, NSQIP 0.61). Jointly training one model on all domains (both registries, all surgery types) yielded better performance than training procedure- or registry-specific models. Conclusion: Automated systems for retrospective identification of SSI in EHRs have the potential to improve the efficiency and reliability of chart reviews for national surveillance and quality improvement programs.
期刊介绍:
Surgical Infections provides comprehensive and authoritative information on the biology, prevention, and management of post-operative infections. Original articles cover the latest advancements, new therapeutic management strategies, and translational research that is being applied to improve clinical outcomes and successfully treat post-operative infections.
Surgical Infections coverage includes:
-Peritonitis and intra-abdominal infections-
Surgical site infections-
Pneumonia and other nosocomial infections-
Cellular and humoral immunity-
Biology of the host response-
Organ dysfunction syndromes-
Antibiotic use-
Resistant and opportunistic pathogens-
Epidemiology and prevention-
The operating room environment-
Diagnostic studies