Arash Arjmand, Majid Bani-Yaghoub, Gary Sutkin, Kiel Corkran, Susanna Paschal
{"title":"使用人口统计学、医院和社会经济预测因子预测医院和社区相关尿路感染的机器学习模型的比较分析","authors":"Arash Arjmand, Majid Bani-Yaghoub, Gary Sutkin, Kiel Corkran, Susanna Paschal","doi":"10.1016/j.jhin.2025.04.024","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Urinary tract infections (UTI) are among the most common infections encountered in both community and healthcare settings. Differentiating between community-associated UTI (CA-UTI) and healthcare-associated UTI (HA-UTI) is crucial for understanding their epidemiology, identifying risk factors, and developing appropriate treatment strategies. Machine learning (ML) techniques have shown significant potential in improving the accuracy of predicting these infections, enabling more effective interventions and better patient outcomes. While previous studies have demonstrated the utility of ML models in various healthcare settings, there is still a need for a comparative analysis of different ML approaches, particularly in distinguishing between CA-UTI and HA-UTI and assessing the risk of UTI among hospitalized patients.</p><p><strong>Objective: </strong>Using 2019-2023 patient demographics, hospital, and socioeconomic data, this study aims to build, validate, and compare machine learning models-Decision Tree (DT), Neural Network (NN), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) to differentiate between the incidences of HA-UTI and CA-UTI. Additionally, it seeks to identify key predictors of UTI using demographic, hospital, and socioeconomic variables.</p><p><strong>Results: </strong>The DT model demonstrated the highest sensitivity, particularly in handling the highly imbalanced data of HAI, with a sensitivity of 87%. LR achieved the best overall accuracy, at 95.9% for HA-UTI and 93.2% for HA-UTI vs. CA-UTI. RF performed best in cross-validation, reaching 99.1% for HA-UTI and 96.2% for HA-UTI vs. CA-UTI. NN showed the highest specificity, at 93.4%, for HA-UTI vs. CA-UTI. The AUC values further supported these findings, ranging from 71.9% for NN to 96% for RF, reflecting the robustness of these models across different annual datasets. Among patient demographics, hospital, and socioeconomic variables, all models consistently identified the nurse units (e.g., inpatient units and mental health units) as the most significant predictors of UTI. In addition to nurse units, LR and DT identified location (e.g., various clinics and medical centres) as a key predictor. For HA-UTI versus CA-UTI, variations were observed across the years, with patient age, median household income, and gender intermittently emerging as key predictors.</p><p><strong>Conclusion: </strong>The predictive accuracy of the machine learning models is relatively the same, with some differences in sensitivity and specificity for identifying both HA-UTI vs. CA-UTI and HA-UTI. Nurse units consistently emerge as the most significant predictors across all years. The importance of all predictors, such as socioeconomic factors and location, varies from year to year, highlighting the need for incorporating those variables in the surveillance systems to optimize the accuracy of predictions.</p>","PeriodicalId":54806,"journal":{"name":"Journal of Hospital Infection","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of Machine Learning Models for Predicting Hospital- and Community-Associated Urinary Tract Infections Using Demographic, Hospital, and Socioeconomic Predictors.\",\"authors\":\"Arash Arjmand, Majid Bani-Yaghoub, Gary Sutkin, Kiel Corkran, Susanna Paschal\",\"doi\":\"10.1016/j.jhin.2025.04.024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Urinary tract infections (UTI) are among the most common infections encountered in both community and healthcare settings. Differentiating between community-associated UTI (CA-UTI) and healthcare-associated UTI (HA-UTI) is crucial for understanding their epidemiology, identifying risk factors, and developing appropriate treatment strategies. Machine learning (ML) techniques have shown significant potential in improving the accuracy of predicting these infections, enabling more effective interventions and better patient outcomes. While previous studies have demonstrated the utility of ML models in various healthcare settings, there is still a need for a comparative analysis of different ML approaches, particularly in distinguishing between CA-UTI and HA-UTI and assessing the risk of UTI among hospitalized patients.</p><p><strong>Objective: </strong>Using 2019-2023 patient demographics, hospital, and socioeconomic data, this study aims to build, validate, and compare machine learning models-Decision Tree (DT), Neural Network (NN), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) to differentiate between the incidences of HA-UTI and CA-UTI. Additionally, it seeks to identify key predictors of UTI using demographic, hospital, and socioeconomic variables.</p><p><strong>Results: </strong>The DT model demonstrated the highest sensitivity, particularly in handling the highly imbalanced data of HAI, with a sensitivity of 87%. LR achieved the best overall accuracy, at 95.9% for HA-UTI and 93.2% for HA-UTI vs. CA-UTI. RF performed best in cross-validation, reaching 99.1% for HA-UTI and 96.2% for HA-UTI vs. CA-UTI. NN showed the highest specificity, at 93.4%, for HA-UTI vs. CA-UTI. The AUC values further supported these findings, ranging from 71.9% for NN to 96% for RF, reflecting the robustness of these models across different annual datasets. Among patient demographics, hospital, and socioeconomic variables, all models consistently identified the nurse units (e.g., inpatient units and mental health units) as the most significant predictors of UTI. In addition to nurse units, LR and DT identified location (e.g., various clinics and medical centres) as a key predictor. For HA-UTI versus CA-UTI, variations were observed across the years, with patient age, median household income, and gender intermittently emerging as key predictors.</p><p><strong>Conclusion: </strong>The predictive accuracy of the machine learning models is relatively the same, with some differences in sensitivity and specificity for identifying both HA-UTI vs. CA-UTI and HA-UTI. Nurse units consistently emerge as the most significant predictors across all years. The importance of all predictors, such as socioeconomic factors and location, varies from year to year, highlighting the need for incorporating those variables in the surveillance systems to optimize the accuracy of predictions.</p>\",\"PeriodicalId\":54806,\"journal\":{\"name\":\"Journal of Hospital Infection\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hospital Infection\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jhin.2025.04.024\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hospital Infection","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jhin.2025.04.024","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}
Comparative Analysis of Machine Learning Models for Predicting Hospital- and Community-Associated Urinary Tract Infections Using Demographic, Hospital, and Socioeconomic Predictors.
Background: Urinary tract infections (UTI) are among the most common infections encountered in both community and healthcare settings. Differentiating between community-associated UTI (CA-UTI) and healthcare-associated UTI (HA-UTI) is crucial for understanding their epidemiology, identifying risk factors, and developing appropriate treatment strategies. Machine learning (ML) techniques have shown significant potential in improving the accuracy of predicting these infections, enabling more effective interventions and better patient outcomes. While previous studies have demonstrated the utility of ML models in various healthcare settings, there is still a need for a comparative analysis of different ML approaches, particularly in distinguishing between CA-UTI and HA-UTI and assessing the risk of UTI among hospitalized patients.
Objective: Using 2019-2023 patient demographics, hospital, and socioeconomic data, this study aims to build, validate, and compare machine learning models-Decision Tree (DT), Neural Network (NN), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) to differentiate between the incidences of HA-UTI and CA-UTI. Additionally, it seeks to identify key predictors of UTI using demographic, hospital, and socioeconomic variables.
Results: The DT model demonstrated the highest sensitivity, particularly in handling the highly imbalanced data of HAI, with a sensitivity of 87%. LR achieved the best overall accuracy, at 95.9% for HA-UTI and 93.2% for HA-UTI vs. CA-UTI. RF performed best in cross-validation, reaching 99.1% for HA-UTI and 96.2% for HA-UTI vs. CA-UTI. NN showed the highest specificity, at 93.4%, for HA-UTI vs. CA-UTI. The AUC values further supported these findings, ranging from 71.9% for NN to 96% for RF, reflecting the robustness of these models across different annual datasets. Among patient demographics, hospital, and socioeconomic variables, all models consistently identified the nurse units (e.g., inpatient units and mental health units) as the most significant predictors of UTI. In addition to nurse units, LR and DT identified location (e.g., various clinics and medical centres) as a key predictor. For HA-UTI versus CA-UTI, variations were observed across the years, with patient age, median household income, and gender intermittently emerging as key predictors.
Conclusion: The predictive accuracy of the machine learning models is relatively the same, with some differences in sensitivity and specificity for identifying both HA-UTI vs. CA-UTI and HA-UTI. Nurse units consistently emerge as the most significant predictors across all years. The importance of all predictors, such as socioeconomic factors and location, varies from year to year, highlighting the need for incorporating those variables in the surveillance systems to optimize the accuracy of predictions.
期刊介绍:
The Journal of Hospital Infection is the editorially independent scientific publication of the Healthcare Infection Society. The aim of the Journal is to publish high quality research and information relating to infection prevention and control that is relevant to an international audience.
The Journal welcomes submissions that relate to all aspects of infection prevention and control in healthcare settings. This includes submissions that:
provide new insight into the epidemiology, surveillance, or prevention and control of healthcare-associated infections and antimicrobial resistance in healthcare settings;
provide new insight into cleaning, disinfection and decontamination;
provide new insight into the design of healthcare premises;
describe novel aspects of outbreaks of infection;
throw light on techniques for effective antimicrobial stewardship;
describe novel techniques (laboratory-based or point of care) for the detection of infection or antimicrobial resistance in the healthcare setting, particularly if these can be used to facilitate infection prevention and control;
improve understanding of the motivations of safe healthcare behaviour, or describe techniques for achieving behavioural and cultural change;
improve understanding of the use of IT systems in infection surveillance and prevention and control.