Early detection of vascular catheter-associated infections employing supervised machine learning - a case study in Lleida region.

IF 3.8 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-08-11 DOI:10.1186/s12911-025-03113-5

Radu Spaimoc, Jordi Mateo, Francesc Solsona, Alfredo Jover-Sáenz, Fernando Barcenilla, María Ramírez-Hidalgo, Marcos Serrano, Miquel Mesas, Dídac Florensa

{"title":"Early detection of vascular catheter-associated infections employing supervised machine learning - a case study in Lleida region.","authors":"Radu Spaimoc, Jordi Mateo, Francesc Solsona, Alfredo Jover-Sáenz, Fernando Barcenilla, María Ramírez-Hidalgo, Marcos Serrano, Miquel Mesas, Dídac Florensa","doi":"10.1186/s12911-025-03113-5","DOIUrl":null,"url":null,"abstract":"<p><p>Healthcare-associated infections (HAIs), particularly Vascular Catheter-Associated Infections (VCAIs), are a significant concern, accounting for over 7% of all infections and are often linked to medical devices. Early detection of VCAIs before invasive infection is crucial for improving hospital care and reducing antibiotic use. This study retrospectively developed and evaluated machine learning models to classify VCAIs from patient medical records, excluding fever and antibiotic prescription indicators. The dataset, collected from the group of public hospitals of the Lleida health region in Catalonia (Spain) between 2011 and 2019, consisted of 24,239 episodes with 150 features related to vascular catheter use. After validation, processing and feature engineering, the dataset showed an imbalance, with 94.46% (10,090) non-catheter episodes and 5.53% (591) catheter infection cases. Machine learning classifiers demonstrated significant challenges in classifying imbalanced datasets, particularly in the context of VCAIs. While most models achieved high accuracy and specificity (approximately 97%), they frequently exhibited limited sensitivity, reaching only around 60% in the best-performing cases. Among the evaluated classifiers, the Gradient Boosting (GB) model outperformed others, attaining the highest balanced accuracy (82.5%) and sensitivity (67%), underscoring its potential utility for early VCAI detection. Additionally, the analysis examined the impact of oversampling techniques on model performance. Although these methods enhanced metrics for some classifiers, they did not consistently outperform models trained on the original dataset. Therefore, if the improvement is not significant, it is preferable to use the original dataset. This study highlights that strategic feature engineering with the GB classifier is sufficient to obtain robust VCAI detection before the appearance of a probable sepsis.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"299"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337550/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03113-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Healthcare-associated infections (HAIs), particularly Vascular Catheter-Associated Infections (VCAIs), are a significant concern, accounting for over 7% of all infections and are often linked to medical devices. Early detection of VCAIs before invasive infection is crucial for improving hospital care and reducing antibiotic use. This study retrospectively developed and evaluated machine learning models to classify VCAIs from patient medical records, excluding fever and antibiotic prescription indicators. The dataset, collected from the group of public hospitals of the Lleida health region in Catalonia (Spain) between 2011 and 2019, consisted of 24,239 episodes with 150 features related to vascular catheter use. After validation, processing and feature engineering, the dataset showed an imbalance, with 94.46% (10,090) non-catheter episodes and 5.53% (591) catheter infection cases. Machine learning classifiers demonstrated significant challenges in classifying imbalanced datasets, particularly in the context of VCAIs. While most models achieved high accuracy and specificity (approximately 97%), they frequently exhibited limited sensitivity, reaching only around 60% in the best-performing cases. Among the evaluated classifiers, the Gradient Boosting (GB) model outperformed others, attaining the highest balanced accuracy (82.5%) and sensitivity (67%), underscoring its potential utility for early VCAI detection. Additionally, the analysis examined the impact of oversampling techniques on model performance. Although these methods enhanced metrics for some classifiers, they did not consistently outperform models trained on the original dataset. Therefore, if the improvement is not significant, it is preferable to use the original dataset. This study highlights that strategic feature engineering with the GB classifier is sufficient to obtain robust VCAI detection before the appearance of a probable sepsis.

查看原文本刊更多论文

使用监督机器学习的血管导管相关感染的早期检测- Lleida地区的案例研究。

医疗保健相关感染（HAIs），特别是血管导管相关感染（VCAIs），是一个重大问题，占所有感染的7%以上，通常与医疗设备有关。侵袭性感染前早期发现VCAIs对于改善医院护理和减少抗生素使用至关重要。本研究回顾性地开发并评估了机器学习模型，以从患者医疗记录中对vcai进行分类，不包括发烧和抗生素处方指标。该数据集收集自2011年至2019年期间加泰罗尼亚（西班牙）莱伊达卫生地区的公立医院组，包括24,239例与血管导管使用相关的150个特征。经过验证、处理和特征工程，数据集显示不平衡，94.46%（10090例）的非导管感染病例和5.53%（591例）的导管感染病例。机器学习分类器在分类不平衡数据集方面表现出了巨大的挑战，特别是在vcai的背景下。虽然大多数模型达到了很高的准确性和特异性（约97%），但它们往往表现出有限的灵敏度，在表现最好的病例中仅达到60%左右。在评估的分类器中，梯度增强（GB）模型的表现优于其他分类器，达到了最高的平衡精度（82.5%）和灵敏度（67%），强调了其在早期VCAI检测中的潜在效用。此外，分析检查了过采样技术对模型性能的影响。尽管这些方法增强了一些分类器的度量，但它们并没有始终优于在原始数据集上训练的模型。因此，如果改进不显著，最好使用原始数据集。本研究强调，策略性特征工程与GB分类器足以在可能的败血症出现之前获得鲁棒的VCAI检测。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.