使用人口统计学、医院和社会经济预测因子预测医院和社区相关尿路感染的机器学习模型的比较分析

IF 3.9 3区医学 Q1 INFECTIOUS DISEASES

Journal of Hospital Infection Pub Date : 2025-05-06 DOI:10.1016/j.jhin.2025.04.024

Arash Arjmand, Majid Bani-Yaghoub, Gary Sutkin, Kiel Corkran, Susanna Paschal

{"title":"使用人口统计学、医院和社会经济预测因子预测医院和社区相关尿路感染的机器学习模型的比较分析","authors":"Arash Arjmand, Majid Bani-Yaghoub, Gary Sutkin, Kiel Corkran, Susanna Paschal","doi":"10.1016/j.jhin.2025.04.024","DOIUrl":null,"url":null,"abstract":"Background: Urinary tract infections (UTI) are among the most common infections encountered in both community and healthcare settings. Differentiating between community-associated UTI (CA-UTI) and healthcare-associated UTI (HA-UTI) is crucial for understanding their epidemiology, identifying risk factors, and developing appropriate treatment strategies. Machine learning (ML) techniques have shown significant potential in improving the accuracy of predicting these infections, enabling more effective interventions and better patient outcomes. While previous studies have demonstrated the utility of ML models in various healthcare settings, there is still a need for a comparative analysis of different ML approaches, particularly in distinguishing between CA-UTI and HA-UTI and assessing the risk of UTI among hospitalized patients.Objective: Using 2019-2023 patient demographics, hospital, and socioeconomic data, this study aims to build, validate, and compare machine learning models-Decision Tree (DT), Neural Network (NN), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) to differentiate between the incidences of HA-UTI and CA-UTI. Additionally, it seeks to identify key predictors of UTI using demographic, hospital, and socioeconomic variables.Results: The DT model demonstrated the highest sensitivity, particularly in handling the highly imbalanced data of HAI, with a sensitivity of 87%. LR achieved the best overall accuracy, at 95.9% for HA-UTI and 93.2% for HA-UTI vs. CA-UTI. RF performed best in cross-validation, reaching 99.1% for HA-UTI and 96.2% for HA-UTI vs. CA-UTI. NN showed the highest specificity, at 93.4%, for HA-UTI vs. CA-UTI. The AUC values further supported these findings, ranging from 71.9% for NN to 96% for RF, reflecting the robustness of these models across different annual datasets. Among patient demographics, hospital, and socioeconomic variables, all models consistently identified the nurse units (e.g., inpatient units and mental health units) as the most significant predictors of UTI. In addition to nurse units, LR and DT identified location (e.g., various clinics and medical centres) as a key predictor. For HA-UTI versus CA-UTI, variations were observed across the years, with patient age, median household income, and gender intermittently emerging as key predictors.Conclusion: The predictive accuracy of the machine learning models is relatively the same, with some differences in sensitivity and specificity for identifying both HA-UTI vs. CA-UTI and HA-UTI. Nurse units consistently emerge as the most significant predictors across all years. The importance of all predictors, such as socioeconomic factors and location, varies from year to year, highlighting the need for incorporating those variables in the surveillance systems to optimize the accuracy of predictions.","PeriodicalId":54806,"journal":{"name":"Journal of Hospital Infection","volume":" ","pages":""},"PeriodicalIF":3.9000,"publicationDate":"2025-05-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Comparative Analysis of Machine Learning Models for Predicting Hospital- and Community-Associated Urinary Tract Infections Using Demographic, Hospital, and Socioeconomic Predictors.\",\"authors\":\"Arash Arjmand, Majid Bani-Yaghoub, Gary Sutkin, Kiel Corkran, Susanna Paschal\",\"doi\":\"10.1016/j.jhin.2025.04.024\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Background: Urinary tract infections (UTI) are among the most common infections encountered in both community and healthcare settings. Differentiating between community-associated UTI (CA-UTI) and healthcare-associated UTI (HA-UTI) is crucial for understanding their epidemiology, identifying risk factors, and developing appropriate treatment strategies. Machine learning (ML) techniques have shown significant potential in improving the accuracy of predicting these infections, enabling more effective interventions and better patient outcomes. While previous studies have demonstrated the utility of ML models in various healthcare settings, there is still a need for a comparative analysis of different ML approaches, particularly in distinguishing between CA-UTI and HA-UTI and assessing the risk of UTI among hospitalized patients.Objective: Using 2019-2023 patient demographics, hospital, and socioeconomic data, this study aims to build, validate, and compare machine learning models-Decision Tree (DT), Neural Network (NN), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) to differentiate between the incidences of HA-UTI and CA-UTI. Additionally, it seeks to identify key predictors of UTI using demographic, hospital, and socioeconomic variables.Results: The DT model demonstrated the highest sensitivity, particularly in handling the highly imbalanced data of HAI, with a sensitivity of 87%. LR achieved the best overall accuracy, at 95.9% for HA-UTI and 93.2% for HA-UTI vs. CA-UTI. RF performed best in cross-validation, reaching 99.1% for HA-UTI and 96.2% for HA-UTI vs. CA-UTI. NN showed the highest specificity, at 93.4%, for HA-UTI vs. CA-UTI. The AUC values further supported these findings, ranging from 71.9% for NN to 96% for RF, reflecting the robustness of these models across different annual datasets. Among patient demographics, hospital, and socioeconomic variables, all models consistently identified the nurse units (e.g., inpatient units and mental health units) as the most significant predictors of UTI. In addition to nurse units, LR and DT identified location (e.g., various clinics and medical centres) as a key predictor. For HA-UTI versus CA-UTI, variations were observed across the years, with patient age, median household income, and gender intermittently emerging as key predictors.Conclusion: The predictive accuracy of the machine learning models is relatively the same, with some differences in sensitivity and specificity for identifying both HA-UTI vs. CA-UTI and HA-UTI. Nurse units consistently emerge as the most significant predictors across all years. The importance of all predictors, such as socioeconomic factors and location, varies from year to year, highlighting the need for incorporating those variables in the surveillance systems to optimize the accuracy of predictions.\",\"PeriodicalId\":54806,\"journal\":{\"name\":\"Journal of Hospital Infection\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":3.9000,\"publicationDate\":\"2025-05-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Hospital Infection\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://doi.org/10.1016/j.jhin.2025.04.024\",\"RegionNum\":3,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"INFECTIOUS DISEASES\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Hospital Infection","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1016/j.jhin.2025.04.024","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFECTIOUS DISEASES","Score":null,"Total":0}

引用次数: 0

摘要

背景：尿路感染（UTI）是社区和医疗机构中最常见的感染之一。区分社区相关UTI （CA-UTI）和卫生保健相关UTI （HA-UTI）对于了解其流行病学、确定危险因素和制定适当的治疗策略至关重要。机器学习（ML）技术在提高预测这些感染的准确性，实现更有效的干预和更好的患者结果方面显示出巨大的潜力。虽然以前的研究已经证明了ML模型在各种医疗保健环境中的效用，但仍然需要对不同ML方法进行比较分析，特别是在区分CA-UTI和HA-UTI以及评估住院患者中UTI的风险方面。目的：利用2019-2023年患者人口统计学、医院和社会经济数据，本研究旨在建立、验证和比较机器学习模型——决策树（DT）、神经网络（NN）、逻辑回归（LR）、随机森林（RF）和极端梯度增强（XGBoost），以区分HA-UTI和CA-UTI的发病率。此外，它还试图利用人口统计、医院和社会经济变量来确定尿路感染的关键预测因素。结果：DT模型灵敏度最高，特别是在处理HAI高度不平衡数据时，灵敏度为87%。LR达到了最佳的总体准确度，HA-UTI为95.9%，HA-UTI和CA-UTI为93.2%。RF在交叉验证中表现最好，HA-UTI达到99.1%，HA-UTI与CA-UTI达到96.2%。神经网络对HA-UTI和CA-UTI的特异性最高，为93.4%。AUC值进一步支持了这些发现，从NN的71.9%到RF的96%，反映了这些模型在不同年度数据集上的稳健性。在患者人口统计、医院和社会经济变量中，所有模型都一致认为护士单位（例如，住院单位和精神卫生单位）是尿路感染最重要的预测因子。除了护士单位外，LR和DT还将地点（例如各种诊所和医疗中心）确定为关键预测因素。对于HA-UTI和CA-UTI，观察到不同年份的差异，患者年龄、家庭收入中位数和性别间歇性地成为关键预测因素。结论：机器学习模型对HA-UTI、CA-UTI和HA-UTI的预测准确率基本相同，但在敏感性和特异性上存在一定差异。护士单位一直是所有年份中最重要的预测因素。所有预测因素的重要性，如社会经济因素和地点，每年都有所不同，突出表明需要将这些变量纳入监测系统，以优化预测的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Comparative Analysis of Machine Learning Models for Predicting Hospital- and Community-Associated Urinary Tract Infections Using Demographic, Hospital, and Socioeconomic Predictors.

Background: Urinary tract infections (UTI) are among the most common infections encountered in both community and healthcare settings. Differentiating between community-associated UTI (CA-UTI) and healthcare-associated UTI (HA-UTI) is crucial for understanding their epidemiology, identifying risk factors, and developing appropriate treatment strategies. Machine learning (ML) techniques have shown significant potential in improving the accuracy of predicting these infections, enabling more effective interventions and better patient outcomes. While previous studies have demonstrated the utility of ML models in various healthcare settings, there is still a need for a comparative analysis of different ML approaches, particularly in distinguishing between CA-UTI and HA-UTI and assessing the risk of UTI among hospitalized patients.

Objective: Using 2019-2023 patient demographics, hospital, and socioeconomic data, this study aims to build, validate, and compare machine learning models-Decision Tree (DT), Neural Network (NN), Logistic Regression (LR), Random Forest (RF), and Extreme Gradient Boosting (XGBoost) to differentiate between the incidences of HA-UTI and CA-UTI. Additionally, it seeks to identify key predictors of UTI using demographic, hospital, and socioeconomic variables.

Results: The DT model demonstrated the highest sensitivity, particularly in handling the highly imbalanced data of HAI, with a sensitivity of 87%. LR achieved the best overall accuracy, at 95.9% for HA-UTI and 93.2% for HA-UTI vs. CA-UTI. RF performed best in cross-validation, reaching 99.1% for HA-UTI and 96.2% for HA-UTI vs. CA-UTI. NN showed the highest specificity, at 93.4%, for HA-UTI vs. CA-UTI. The AUC values further supported these findings, ranging from 71.9% for NN to 96% for RF, reflecting the robustness of these models across different annual datasets. Among patient demographics, hospital, and socioeconomic variables, all models consistently identified the nurse units (e.g., inpatient units and mental health units) as the most significant predictors of UTI. In addition to nurse units, LR and DT identified location (e.g., various clinics and medical centres) as a key predictor. For HA-UTI versus CA-UTI, variations were observed across the years, with patient age, median household income, and gender intermittently emerging as key predictors.

Conclusion: The predictive accuracy of the machine learning models is relatively the same, with some differences in sensitivity and specificity for identifying both HA-UTI vs. CA-UTI and HA-UTI. Nurse units consistently emerge as the most significant predictors across all years. The importance of all predictors, such as socioeconomic factors and location, varies from year to year, highlighting the need for incorporating those variables in the surveillance systems to optimize the accuracy of predictions.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of Hospital Infection 医学-传染病学

CiteScore

12.70

自引率

5.80%

发文量

271

审稿时长

19 days

期刊介绍： The Journal of Hospital Infection is the editorially independent scientific publication of the Healthcare Infection Society. The aim of the Journal is to publish high quality research and information relating to infection prevention and control that is relevant to an international audience. The Journal welcomes submissions that relate to all aspects of infection prevention and control in healthcare settings. This includes submissions that: provide new insight into the epidemiology, surveillance, or prevention and control of healthcare-associated infections and antimicrobial resistance in healthcare settings; provide new insight into cleaning, disinfection and decontamination; provide new insight into the design of healthcare premises; describe novel aspects of outbreaks of infection; throw light on techniques for effective antimicrobial stewardship; describe novel techniques (laboratory-based or point of care) for the detection of infection or antimicrobial resistance in the healthcare setting, particularly if these can be used to facilitate infection prevention and control; improve understanding of the motivations of safe healthcare behaviour, or describe techniques for achieving behavioural and cultural change; improve understanding of the use of IT systems in infection surveillance and prevention and control.