{"title":"Validation of ML-algorithms for the prediction of positive urine cultures from flow cytometry routine data in patients with suspected bacteriuria","authors":"Alexander Brenner , Jutta Esser , Franziska Schuler , Julian Varghese , Frieder Schaumburg","doi":"10.1016/j.ijmm.2025.151652","DOIUrl":null,"url":null,"abstract":"<div><div>Urine samples are frequently analyzed in microbiology laboratories, but a large proportion of them are culture-negative. The aim of this study was to test whether positive urine cultures can be predicted from routine flow cytometric data. Urine samples (n = 1325) were used for a train dataset (n = 1032) and three independent test datasets (n = 93–100 samples) that were collected three months apart. Predictors from flow cytometry were total counts per µl of bacteria, erythrocytes, yeast-like cells, hyaline casts, crystals, leukocytes, squamous epithelial cells, non-hyaline casts and non-squamous epithelial cells in addition to age, sex and type of urine sample. Labels were positive culture and detection of clinically relevant uropathogens. Three classifiers (decision tree, random forest classifier, CatBoost) were 5-fold cross-validated on the train dataset to select an optimized model with ≥ 95 % sensitivity. The optimized model was trained on the complete train dataset and evaluated on the three independent test sets. In total, 72.5 % (960/1325) samples were culture positive with a predominance of <em>Escherichia coli</em> (n = 295). CatBoost outperformed the other classifiers in terms of balanced accuracy (train data) and was selected as the classifier for predictions. With optimised hyperparameters, the balanced accuracy was 62–74 % for the prediction of a positive culture (test data) and had a sensitivity that was stable over a period of six months (94–96 %, negative predictive value [NPV]: 67–77 %, positive predictive value [PPV]: 78–81 %). For the prediction of uropathogens, the balanced accuracy was 57–63 % with a stable sensitivity (95–100 %, NPV: 83–100 %, PPV: 48–59 %). In conclusion, the ML algorithms showed high sensitivity for detecting positive urine cultures.</div></div>","PeriodicalId":50312,"journal":{"name":"International Journal of Medical Microbiology","volume":"318 ","pages":"Article 151652"},"PeriodicalIF":4.5000,"publicationDate":"2025-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Medical Microbiology","FirstCategoryId":"3","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1438422125000086","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Urine samples are frequently analyzed in microbiology laboratories, but a large proportion of them are culture-negative. The aim of this study was to test whether positive urine cultures can be predicted from routine flow cytometric data. Urine samples (n = 1325) were used for a train dataset (n = 1032) and three independent test datasets (n = 93–100 samples) that were collected three months apart. Predictors from flow cytometry were total counts per µl of bacteria, erythrocytes, yeast-like cells, hyaline casts, crystals, leukocytes, squamous epithelial cells, non-hyaline casts and non-squamous epithelial cells in addition to age, sex and type of urine sample. Labels were positive culture and detection of clinically relevant uropathogens. Three classifiers (decision tree, random forest classifier, CatBoost) were 5-fold cross-validated on the train dataset to select an optimized model with ≥ 95 % sensitivity. The optimized model was trained on the complete train dataset and evaluated on the three independent test sets. In total, 72.5 % (960/1325) samples were culture positive with a predominance of Escherichia coli (n = 295). CatBoost outperformed the other classifiers in terms of balanced accuracy (train data) and was selected as the classifier for predictions. With optimised hyperparameters, the balanced accuracy was 62–74 % for the prediction of a positive culture (test data) and had a sensitivity that was stable over a period of six months (94–96 %, negative predictive value [NPV]: 67–77 %, positive predictive value [PPV]: 78–81 %). For the prediction of uropathogens, the balanced accuracy was 57–63 % with a stable sensitivity (95–100 %, NPV: 83–100 %, PPV: 48–59 %). In conclusion, the ML algorithms showed high sensitivity for detecting positive urine cultures.
期刊介绍:
Pathogen genome sequencing projects have provided a wealth of data that need to be set in context to pathogenicity and the outcome of infections. In addition, the interplay between a pathogen and its host cell has become increasingly important to understand and interfere with diseases caused by microbial pathogens. IJMM meets these needs by focussing on genome and proteome analyses, studies dealing with the molecular mechanisms of pathogenicity and the evolution of pathogenic agents, the interactions between pathogens and host cells ("cellular microbiology"), and molecular epidemiology. To help the reader keeping up with the rapidly evolving new findings in the field of medical microbiology, IJMM publishes original articles, case studies and topical, state-of-the-art mini-reviews in a well balanced fashion. All articles are strictly peer-reviewed. Important topics are reinforced by 2 special issues per year dedicated to a particular theme. Finally, at irregular intervals, current opinions on recent or future developments in medical microbiology are presented in an editorial section.