Development and Evaluation of Machine Learning Models for the Identification of Surgical Site Infection in Electronic Health Records.

IF 1.4 4区医学 Q4 INFECTIOUS DISEASES

Surgical infections Pub Date : 2025-09-01 Epub Date: 2025-03-31 DOI:10.1089/sur.2024.266

Arjun Chakraborty, Kevin Lybarger, Jorge A Olivas Estebane, Judy Y Chen, Mahul Patel, Vikas O'Reilly-Shah, Peter Tarczy-Hornoch, Meliha Yetisgen, Dustin R Long

{"title":"Development and Evaluation of Machine Learning Models for the Identification of Surgical Site Infection in Electronic Health Records.","authors":"Arjun Chakraborty, Kevin Lybarger, Jorge A Olivas Estebane, Judy Y Chen, Mahul Patel, Vikas O'Reilly-Shah, Peter Tarczy-Hornoch, Meliha Yetisgen, Dustin R Long","doi":"10.1089/sur.2024.266","DOIUrl":null,"url":null,"abstract":"Background: Surgical site infection (SSI) affects 160,000-300,000 patients per year in the United States, adversely impacting a wide range of patient- and health-system outcomes. Surveillance programs for SSI are essential to quality improvement and public health systems. However, the scope of SSI surveillance is currently limited by the resource-intensive nature of these activities, which are largely based on manual chart review. Recent advances in natural language processing and machine learning could potentially augment the scope and quality of routine SSI surveillance. Patients and Methods: Electronic health records (EHRs) for 28,864 surgical procedures (representing 25% of all surgical cases) linked to either National Healthcare Safety Network (NHSN) data from Harborview Medical Center or National Surgical Quality Improvement Program (NSQIP) data from the University of Washington Montlake Medical Center were included. Cases comprised five different surgical procedure types performed between 2010 and 2020 (general surgery, gynecological surgery, spine surgery, non-spine orthopedic surgery, and non-spine neurological surgery). Using all clinical notes and structured data elements, we trained random forest and neural network models to identify SSI cases. We conducted experiments to evaluate the impact of clinical notes on the task of retrospective SSI identification and to study domain adaptation across different procedure types and registries. Results: The best performing model utilized a neural network with input derived from both structured data and unstructured text notes, trained on all surgery types (F1 score: NHSN 0.77, NSQIP 0.58; area under the receiver operating characteristic curve: NHSN 0.98, NSQIP 0.92; recall: NHSN 0.85, NSQIP 0.61). Jointly training one model on all domains (both registries, all surgery types) yielded better performance than training procedure- or registry-specific models. Conclusion: Automated systems for retrospective identification of SSI in EHRs have the potential to improve the efficiency and reliability of chart reviews for national surveillance and quality improvement programs.","PeriodicalId":22109,"journal":{"name":"Surgical infections","volume":" ","pages":"474-481"},"PeriodicalIF":1.4000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Surgical infections","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1089/sur.2024.266","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/31 0:00:00","PubModel":"Epub","JCR":"Q4","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Surgical site infection (SSI) affects 160,000-300,000 patients per year in the United States, adversely impacting a wide range of patient- and health-system outcomes. Surveillance programs for SSI are essential to quality improvement and public health systems. However, the scope of SSI surveillance is currently limited by the resource-intensive nature of these activities, which are largely based on manual chart review. Recent advances in natural language processing and machine learning could potentially augment the scope and quality of routine SSI surveillance. Patients and Methods: Electronic health records (EHRs) for 28,864 surgical procedures (representing 25% of all surgical cases) linked to either National Healthcare Safety Network (NHSN) data from Harborview Medical Center or National Surgical Quality Improvement Program (NSQIP) data from the University of Washington Montlake Medical Center were included. Cases comprised five different surgical procedure types performed between 2010 and 2020 (general surgery, gynecological surgery, spine surgery, non-spine orthopedic surgery, and non-spine neurological surgery). Using all clinical notes and structured data elements, we trained random forest and neural network models to identify SSI cases. We conducted experiments to evaluate the impact of clinical notes on the task of retrospective SSI identification and to study domain adaptation across different procedure types and registries. Results: The best performing model utilized a neural network with input derived from both structured data and unstructured text notes, trained on all surgery types (F1 score: NHSN 0.77, NSQIP 0.58; area under the receiver operating characteristic curve: NHSN 0.98, NSQIP 0.92; recall: NHSN 0.85, NSQIP 0.61). Jointly training one model on all domains (both registries, all surgery types) yielded better performance than training procedure- or registry-specific models. Conclusion: Automated systems for retrospective identification of SSI in EHRs have the potential to improve the efficiency and reliability of chart reviews for national surveillance and quality improvement programs.

查看原文本刊更多论文

在电子健康记录中识别手术部位感染的机器学习模型的开发和评估。

背景：在美国，手术部位感染（SSI）每年影响160,000-300,000例患者，对患者和卫生系统的结果产生了广泛的不利影响。SSI监测项目对质量改善和公共卫生系统至关重要。然而，SSI监测的范围目前受到这些活动的资源密集性质的限制，这些活动主要以人工图表审查为基础。自然语言处理和机器学习的最新进展可能会扩大常规SSI监视的范围和质量。患者和方法：纳入了28,864例外科手术的电子健康记录（EHRs），这些手术与来自Harborview医疗中心的国家医疗安全网络（NHSN）数据或来自华盛顿蒙特莱克大学医疗中心的国家手术质量改进计划（NSQIP）数据相关联。病例包括2010年至2020年间进行的五种不同的外科手术类型（普通外科、妇科外科、脊柱外科、非脊柱骨科手术和非脊柱神经外科）。利用所有临床记录和结构化数据元素，我们训练随机森林和神经网络模型来识别SSI病例。我们进行了实验来评估临床记录对回顾性SSI识别任务的影响，并研究了不同手术类型和登记的领域适应。结果：表现最好的模型使用的神经网络输入来自结构化数据和非结构化文本注释，对所有手术类型进行了训练(F1得分：NHSN 0.77, NSQIP 0.58；受试者工作特性曲线下面积：NHSN 0.98, NSQIP 0.92；召回率：NHSN 0.85, NSQIP 0.61)。在所有领域（两个注册中心，所有手术类型）上联合训练一个模型比训练特定于程序或注册中心的模型产生更好的性能。结论：用于电子病历中SSI回顾性识别的自动化系统有可能提高国家监测和质量改进计划中图表评审的效率和可靠性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Surgical infections INFECTIOUS DISEASES-SURGERY

CiteScore

3.80

自引率

5.00%

发文量

127

审稿时长

6-12 weeks

期刊介绍： Surgical Infections provides comprehensive and authoritative information on the biology, prevention, and management of post-operative infections. Original articles cover the latest advancements, new therapeutic management strategies, and translational research that is being applied to improve clinical outcomes and successfully treat post-operative infections. Surgical Infections coverage includes: -Peritonitis and intra-abdominal infections- Surgical site infections- Pneumonia and other nosocomial infections- Cellular and humoral immunity- Biology of the host response- Organ dysfunction syndromes- Antibiotic use- Resistant and opportunistic pathogens- Epidemiology and prevention- The operating room environment- Diagnostic studies