Radu Spaimoc, Jordi Mateo, Francesc Solsona, Alfredo Jover-Sáenz, Fernando Barcenilla, María Ramírez-Hidalgo, Marcos Serrano, Miquel Mesas, Dídac Florensa
{"title":"Early detection of vascular catheter-associated infections employing supervised machine learning - a case study in Lleida region.","authors":"Radu Spaimoc, Jordi Mateo, Francesc Solsona, Alfredo Jover-Sáenz, Fernando Barcenilla, María Ramírez-Hidalgo, Marcos Serrano, Miquel Mesas, Dídac Florensa","doi":"10.1186/s12911-025-03113-5","DOIUrl":null,"url":null,"abstract":"<p><p>Healthcare-associated infections (HAIs), particularly Vascular Catheter-Associated Infections (VCAIs), are a significant concern, accounting for over 7% of all infections and are often linked to medical devices. Early detection of VCAIs before invasive infection is crucial for improving hospital care and reducing antibiotic use. This study retrospectively developed and evaluated machine learning models to classify VCAIs from patient medical records, excluding fever and antibiotic prescription indicators. The dataset, collected from the group of public hospitals of the Lleida health region in Catalonia (Spain) between 2011 and 2019, consisted of 24,239 episodes with 150 features related to vascular catheter use. After validation, processing and feature engineering, the dataset showed an imbalance, with 94.46% (10,090) non-catheter episodes and 5.53% (591) catheter infection cases. Machine learning classifiers demonstrated significant challenges in classifying imbalanced datasets, particularly in the context of VCAIs. While most models achieved high accuracy and specificity (approximately 97%), they frequently exhibited limited sensitivity, reaching only around 60% in the best-performing cases. Among the evaluated classifiers, the Gradient Boosting (GB) model outperformed others, attaining the highest balanced accuracy (82.5%) and sensitivity (67%), underscoring its potential utility for early VCAI detection. Additionally, the analysis examined the impact of oversampling techniques on model performance. Although these methods enhanced metrics for some classifiers, they did not consistently outperform models trained on the original dataset. Therefore, if the improvement is not significant, it is preferable to use the original dataset. This study highlights that strategic feature engineering with the GB classifier is sufficient to obtain robust VCAI detection before the appearance of a probable sepsis.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"299"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12337550/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03113-5","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}
引用次数: 0
Abstract
Healthcare-associated infections (HAIs), particularly Vascular Catheter-Associated Infections (VCAIs), are a significant concern, accounting for over 7% of all infections and are often linked to medical devices. Early detection of VCAIs before invasive infection is crucial for improving hospital care and reducing antibiotic use. This study retrospectively developed and evaluated machine learning models to classify VCAIs from patient medical records, excluding fever and antibiotic prescription indicators. The dataset, collected from the group of public hospitals of the Lleida health region in Catalonia (Spain) between 2011 and 2019, consisted of 24,239 episodes with 150 features related to vascular catheter use. After validation, processing and feature engineering, the dataset showed an imbalance, with 94.46% (10,090) non-catheter episodes and 5.53% (591) catheter infection cases. Machine learning classifiers demonstrated significant challenges in classifying imbalanced datasets, particularly in the context of VCAIs. While most models achieved high accuracy and specificity (approximately 97%), they frequently exhibited limited sensitivity, reaching only around 60% in the best-performing cases. Among the evaluated classifiers, the Gradient Boosting (GB) model outperformed others, attaining the highest balanced accuracy (82.5%) and sensitivity (67%), underscoring its potential utility for early VCAI detection. Additionally, the analysis examined the impact of oversampling techniques on model performance. Although these methods enhanced metrics for some classifiers, they did not consistently outperform models trained on the original dataset. Therefore, if the improvement is not significant, it is preferable to use the original dataset. This study highlights that strategic feature engineering with the GB classifier is sufficient to obtain robust VCAI detection before the appearance of a probable sepsis.
期刊介绍:
BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.