Meghamala Sinha, Perry Haaland, Ashok Krishnamurthy, Bo Lan, Stephen A Ramsey, Patrick L Schmitt, Priya Sharma, Hao Xu, Karamarie Fecho
{"title":"Causal analysis for multivariate integrated clinical and environmental exposures data.","authors":"Meghamala Sinha, Perry Haaland, Ashok Krishnamurthy, Bo Lan, Stephen A Ramsey, Patrick L Schmitt, Priya Sharma, Hao Xu, Karamarie Fecho","doi":"10.1186/s12911-025-02849-4","DOIUrl":"10.1186/s12911-025-02849-4","url":null,"abstract":"<p><p>Electronic health records (EHRs) provide a rich source of observational patient data that can be explored to infer underlying causal relationships. These causal relationships can be applied to augment medical decision-making or suggest hypotheses for healthcare research. In this study, we explored a large-scale EHR dataset on patients with asthma or related conditions (N = 14,937). The dataset included integrated data on features representing demographic factors, clinical measures, and environmental exposures. The data were accessed via a service named the Integrated Clinical and Environmental Service (ICEES). We estimated underlying causal relationships from the data to identify significant predictors of asthma attacks. We also performed simulated interventions on the inferred causal network to detect the causal effects, in terms of shifts in probability distribution for asthma attacks.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"27"},"PeriodicalIF":3.3,"publicationDate":"2025-01-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11736916/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Identifying effective immune biomarkers in alopecia areata diagnosis based on machine learning methods.","authors":"Qingde Zhou, Lan Lan, Wei Wang, Xinchang Xu","doi":"10.1186/s12911-025-02853-8","DOIUrl":"10.1186/s12911-025-02853-8","url":null,"abstract":"<p><strong>Background: </strong>Alopecia areata (AA) is a common non-scarring hair loss disorder associated with autoimmune conditions. However, the pathobiology of AA is not well understood, and there is no targeted therapy available for AA. METHODS: In this study, differential gene expression analysis, immune status assessment, weighted correlation network analysis (WGCNA), and functional enrichment analysis were performed to identify shared genes associated with both immunological response and AA. Machine learning methods were then used to identify three hub genes as potential diagnostic markers for AA. External validation was performed, and the correlation of hub genes with immune infiltration, immune checkpoint genes, and key marker genes and pathways were evaluated.</p><p><strong>Results: </strong>Three hub genes were identified, which accurately predicted the progression of AA and the immune status. The hub genes were found to be diagnostic markers for AA with high predictive accuracy. External validation confirmed the efficacy of these markers in identifying AA patients.</p><p><strong>Conclusion: </strong>Overall, the study provides a novel approach for the diagnosis, prevention, and treatment of AA. The findings could potentially lead to the development of targeted therapies for AA based on the identified hub genes. The study also highlights the potential of machine learning and bioinformatics analysis in identifying new biomarkers for autoimmune diseases.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"23"},"PeriodicalIF":3.3,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11734347/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982130","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction: Which criteria are important in usability evaluation of mHealth applications: an umbrella review.","authors":"Zahra Galavi, Mahdieh Montazeri, Reza Khajouei","doi":"10.1186/s12911-025-02860-9","DOIUrl":"10.1186/s12911-025-02860-9","url":null,"abstract":"","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"22"},"PeriodicalIF":3.3,"publicationDate":"2025-01-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11731404/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142982908","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jacques K Muthusi, Peter W Young, Frankline O Mboya, Samuel M Mwalili
{"title":"%diag_test: a generic SAS macro for evaluating diagnostic accuracy measures for multiple diagnostic tests.","authors":"Jacques K Muthusi, Peter W Young, Frankline O Mboya, Samuel M Mwalili","doi":"10.1186/s12911-024-02808-5","DOIUrl":"10.1186/s12911-024-02808-5","url":null,"abstract":"<p><strong>Background: </strong>Measures of diagnostic test accuracy provide evidence of how well a test correctly identifies or rules-out disease. Commonly used diagnostic accuracy measures (DAMs) include sensitivity and specificity, predictive values, likelihood ratios, area under the receiver operator characteristic curve (AUROC), area under precision-recall curves (AUPRC), diagnostic effectiveness (accuracy), disease prevalence, and diagnostic odds ratio (DOR) etc. Most available analysis tools perform accuracy testing for a single diagnostic test using summarized data. We developed a SAS macro for evaluating multiple diagnostic tests using individual-level data that creates a 2 × 2 summary table, AUROC and AUPRC as part of output.</p><p><strong>Methods: </strong>The SAS macro presented here is automated to reduce analysis time and transcription errors. It is simple to use as the user only needs to specify the input dataset, \"standard\" and \"test\" variables and threshold values. It creates a publication-quality output in Microsoft Word and Excel showing more than 15 different accuracy measures together with overlaid AUROC and AUPRC graphics to help the researcher in making decisions to adopt or reject diagnostic tests. Further, it provides for additional variance estimation methods other than the normal distribution approximation.</p><p><strong>Results: </strong>We tested the macro for quality control purposes by reproducing results from published work on evaluation of multiple types of dried blood spots (DBS) as an alternative for human immunodeficiency virus (HIV) viral load (VL) monitoring in resource-limited settings compared to plasma, the gold-standard. Plasma viral load reagents are costly, and blood must be prepared in a reference laboratory setting by a qualified technician. On the other hand, DBS are easy to prepare without these restrictions. This study evaluated the suitability of DBS from venous, microcapillary and direct spotting DBS, hence multiple diagnostic tests which were compared to plasma specimen. We also used the macro to reproduce results of published work on evaluating performance of multiple classification machine learning algorithms for predicting coronary artery disease.</p><p><strong>Conclusion: </strong>The SAS macro presented here is a powerful analytic tool for analyzing data from multiple diagnostic tests. The SAS programmer can modify the source code to include other diagnostic measures and variance estimation methods. By automating analysis, the macro adds value by saving analysis time, reducing transcription errors, and producing publication-quality outputs.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"21"},"PeriodicalIF":3.3,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977760","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Georges Nguefack-Tsague, Fabrice Zobel Lekeumo Cheuyem, Boris Edmond Noah, Valérie Ndobo-Koe, Adidja Amani, Léa Melataguia Mekontchou, Marie Ntep Gweth, Annick Collins Mfoulou Minso Assala, Marie Nicole Ngoufack, Pierre René Binyom
{"title":"Mortality and morbidity patterns in Yaoundé, Cameroon: an ICD-11 classification-based analysis.","authors":"Georges Nguefack-Tsague, Fabrice Zobel Lekeumo Cheuyem, Boris Edmond Noah, Valérie Ndobo-Koe, Adidja Amani, Léa Melataguia Mekontchou, Marie Ntep Gweth, Annick Collins Mfoulou Minso Assala, Marie Nicole Ngoufack, Pierre René Binyom","doi":"10.1186/s12911-025-02854-7","DOIUrl":"10.1186/s12911-025-02854-7","url":null,"abstract":"<p><strong>Background: </strong>In Cameroon, like in many other resource-limited countries, data generated by health settings including morbidity and mortality parameters are not always uniform. In the absence of a national guideline necessary for the standardization and harmonization of data, precision of data required for effective decision-making is therefore not guaranteed. The objective of the present study was to assess the reporting style of morbidity and mortality data in healthcare settings.</p><p><strong>Methods: </strong>An institutional-based cross-sectional study was carried out from May to June 2022 at the Yaoundé Central Hospital. A questionnaire was used to assess the need to set up a standard tool to improve the reporting system. Medical records were used to collect mortality and morbidity data which were then compared to the International Statistical Classification of Diseases and Related Health Problems-11 (ICD-11) codification. Data were analyzed using IBM-SPSS version 26.</p><p><strong>Results: </strong>Out of 200 patients' morbidity causes recorded, nearly three-quarter (74.0%) were heterogeneous, and two over five (41.0%) of mortality causes reported were also heterogeneous. Most of respondents stated the need to set up a standard tool for collecting mortality and morbidity data (83.6%). Less than one-fifth (18.2%) of health care providers were able to understand data flow, correctly archived data (36.6%) and used electronic tools for data quality control (40.0%).</p><p><strong>Conclusion: </strong>There were high levels of heterogeneities of morbidity and mortality causes among patients admitted to the Yaoundé Central Hospital in 2021. It is therefore urgent that Cameroon national health authorities implement the ICD-11 to allow the systematic recording, analysis, interpretation and comparison of mortality and morbidity data collected in Yaoundé Central Hospital at different times; and ensure interoperability and reusability of recorded data for medical decision support.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"19"},"PeriodicalIF":3.3,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730474/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977763","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
L Alexander Vance, Leslie Way, Deepali Kulkarni, Emily O C Palmer, Abhijit Ghosh, Melissa Unruh, Kelly M Y Chan, Amey Girdhari, Joydeep Sarkar
{"title":"Natural language processing to identify suicidal ideation and anhedonia in major depressive disorder.","authors":"L Alexander Vance, Leslie Way, Deepali Kulkarni, Emily O C Palmer, Abhijit Ghosh, Melissa Unruh, Kelly M Y Chan, Amey Girdhari, Joydeep Sarkar","doi":"10.1186/s12911-025-02851-w","DOIUrl":"10.1186/s12911-025-02851-w","url":null,"abstract":"<p><strong>Background: </strong>Anhedonia and suicidal ideation are symptoms of major depressive disorder (MDD) that are not regularly captured in structured scales but may be captured in unstructured clinical notes. Natural language processing (NLP) techniques may be used to extract longitudinal data on suicidal behaviors and anhedonia within unstructured clinical notes. This study assessed the accuracy of using NLP techniques on electronic health records (EHRs) to identify these symptoms among patients with MDD.</p><p><strong>Methods: </strong>EHR-derived, de-identified data were used from the NeuroBlu Database (version 23R1), a longitudinal behavioral health real-world database. Mental health clinicians annotated instances of anhedonia and suicidal symptoms in clinical notes creating a ground truth. Interrater reliability (IRR) was calculated using Krippendorff's alpha. A novel transformer architecture-based NLP model was trained on clinical notes to recognize linguistic patterns and contextual cues. Each sentence was categorized into one of four labels: (1) anhedonia; (2) suicidal ideation without intent or plan; (3) suicidal ideation with intent or plan; (4) absence of suicidal ideation or anhedonia. The model was assessed using positive predictive values (PPV), negative predictive values, sensitivity, specificity, F1-score, and AUROC.</p><p><strong>Results: </strong>The model was trained, tested, and validated on 2,198, 1,247, and 1,016 distinct clinical notes, respectively. IRR was 0.80. For anhedonia, suicidal ideation with intent or plan, and suicidal ideation without intent or plan the model achieved a PPV of 0.98, 0.93, and 0.87, an F1-score of 0.98, 0.91, and 0.89 during training and a PPV of 0.99, 0.95, and 0.87 and F1-score of 0.99, 0.95, and 0.89 during validation.</p><p><strong>Conclusions: </strong>NLP techniques can leverage contextual information in EHRs to identify anhedonia and suicidal symptoms in patients with MDD. Integrating structured and unstructured data offers a comprehensive view of MDD's trajectory, helping healthcare providers deliver timely, effective interventions. Addressing current limitations will further enhance NLP models, enabling more accurate extraction of critical clinical features and supporting personalized, proactive mental health care.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"20"},"PeriodicalIF":3.3,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11730826/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977765","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Predictive value of machine learning for the progression of gestational diabetes mellitus to type 2 diabetes: a systematic review and meta-analysis.","authors":"Meng Zhao, Zhixin Yao, Yan Zhang, Lidan Ma, Wenquan Pang, Shuyin Ma, Yijun Xu, Lili Wei","doi":"10.1186/s12911-024-02848-x","DOIUrl":"10.1186/s12911-024-02848-x","url":null,"abstract":"<p><strong>Background: </strong>This systematic review aims to explore the early predictive value of machine learning (ML) models for the progression of gestational diabetes mellitus (GDM) to type 2 diabetes mellitus (T2DM).</p><p><strong>Methods: </strong>A comprehensive and systematic search was conducted in Pubmed, Cochrane, Embase, and Web of Science up to July 02, 2024. The quality of the studies included was assessed. The risk of bias was assessed through the prediction model risk of bias assessment tool and a graph was drawn accordingly. The meta-analysis was performed using Stata15.0.</p><p><strong>Results: </strong>A total of 13 studies were included in the present review, involving 11,320 GDM patients and 22 ML models. The meta-analysis for ML models showed a pooled C-statistic of 0.82 (95% CI: 0.79 ~ 0.86), a pooled sensitivity of 0.76 (0.72 ~ 0.80), and a pooled specificity of 0.57 (0.50 ~ 0.65).</p><p><strong>Conclusion: </strong>ML has favorable diagnostic accuracy for the progression of GDM to T2DM. This provides evidence for the development of predictive tools with broader applicability.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"18"},"PeriodicalIF":3.3,"publicationDate":"2025-01-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11727323/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142977777","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Maria Aguiar, Ander Cejudo, Gorka Epelde, Deisy Chaves, Maria Trujillo, Garazi Artola, Unai Ayala, Roberto Bilbao, Itziar Tueros
{"title":"An approach to boost adherence to self-data reporting in mHealth applications for users without specific health conditions.","authors":"Maria Aguiar, Ander Cejudo, Gorka Epelde, Deisy Chaves, Maria Trujillo, Garazi Artola, Unai Ayala, Roberto Bilbao, Itziar Tueros","doi":"10.1186/s12911-024-02833-4","DOIUrl":"10.1186/s12911-024-02833-4","url":null,"abstract":"<p><strong>Background: </strong>The popularization of mobile health (mHealth) apps for public health or medical care purposes has transformed human life substantially, improving lifestyle behaviors and chronic condition management. The objective of this study is to evaluate the effect of gamification features in a mHealth app that includes the most common categories of behavior change techniques for the self-report of lifestyle data. The data reported by the user can be manual (i.e., diet, activity, and weight) and automatic (Fitbit wearable devices). As a secondary objective, this work aims to explore the differences in the adherence when considering a longer study duration and make a comparative analysis of the gamification effect.</p><p><strong>Methods: </strong>In this study, the effectiveness of various behavior change techniques strategies is evaluated through the analysis of two user groups. With a first group of users, we perform a comparative analysis in terms of adherence and system usability scale of two versions of the app, both including the most common categories of behavior change techniques but the second version having added gamification features. Then, with a second group of participants and the best mHealth app version, a longer study is carried out and user adherence, the system usability scale and user feedback are analyzed.</p><p><strong>Results: </strong>In the first stage study, results have shown that the app version with gamification features has achieved a higher adherence, as the percentage of days active was higher for most of the users and the system usability scale score is 80.67, which is categorized as rank A. The app also exceeded the expectations of the users by about 70% for the app version with gamification functionalities. In the second stage of the study, an adherence of 76.25% is reported after 8 weeks and 58% at the end of the pilot for the mHealth app. Similarly, for the wearable device, an adherence of 74.32% is achieved after 8 weeks and 81.08% is obtained at the end of the pilot. We hypothesize that these specific wearable devices have contributed to a decreased system usability scale score, reaching 62.89 which is ranked as C.</p><p><strong>Conclusion: </strong>This study evidences the effectiveness of the gamification category of behavior change techniques in increasing the overall user adherence, expectations, and perceived usability. In addition, the results provide quantitative results on the effect of the most common categories of behavior change techniques for the self-report of lifestyle data. Therefore, a higher duration in the study has shown several limitations when capturing lifestyle data, especially when including wearable devices such as Fitbit.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"16"},"PeriodicalIF":3.3,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11721516/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142963749","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Akanksha Singh, Benjamin Schooley, John Mobley, Patrick Mobley, Sydney Lindros, John M Brooks, Sarah B Floyd
{"title":"Human-centered design of a health recommender system for orthopaedic shoulder treatment.","authors":"Akanksha Singh, Benjamin Schooley, John Mobley, Patrick Mobley, Sydney Lindros, John M Brooks, Sarah B Floyd","doi":"10.1186/s12911-025-02850-x","DOIUrl":"10.1186/s12911-025-02850-x","url":null,"abstract":"<p><strong>Background: </strong>Rich data on diverse patients and their treatments and outcomes within Electronic Health Record (EHR) systems can be used to generate real world evidence. A health recommender system (HRS) framework can be applied to a decision support system application to generate data summaries for similar patients during the clinical encounter to assist physicians and patients in making evidence-based shared treatment decisions.</p><p><strong>Objective: </strong>A human-centered design (HCD) process was used to develop a HRS for treatment decision support in orthopaedic medicine, the Informatics Consult for Individualized Treatment (I-C-IT). We also evaluate the usability and utility of the system from the physician's perspective, focusing on elements of utility and shared decision-making in orthopaedic medicine.</p><p><strong>Methods: </strong>The HCD process for I-C-IT included 6 steps across three phases of analysis, design, and evaluation. A team of health informatics and comparative effectiveness researchers directly engaged with orthopaedic surgeon subject matter experts in a collaborative I-C-IT prototype design process. Ten orthopaedic surgeons participated in a mixed methods evaluation of the I-C-IT prototype that was produced.</p><p><strong>Results: </strong>The HCD process resulted in a prototype system, I-C-IT, with 14 data visualization elements and a set of design principles crucial for HRS for decision support. The overall standard system usability scale (SUS) score for the I-C-IT Webapp prototype was 88.75 indicating high usability. In addition, utility questions addressing shared decision-making found that 90% of orthopaedic surgeon respondents either strongly agreed or agreed that I-C-IT would help them make data informed decisions with their patients.</p><p><strong>Conclusion: </strong>The HCD process produced an HRS prototype that is capable of supporting orthopaedic surgeons and patients in their information needs during clinical encounters. Future research should focus on refining I-C-IT by incorporating patient feedback in future iterative cycles of system design and evaluation.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"17"},"PeriodicalIF":3.3,"publicationDate":"2025-01-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11720343/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142963753","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Prediction of urinary tract infection using machine learning methods: a study for finding the most-informative variables.","authors":"Sajjad Farashi, Hossein Emad Momtaz","doi":"10.1186/s12911-024-02819-2","DOIUrl":"10.1186/s12911-024-02819-2","url":null,"abstract":"<p><strong>Background: </strong>Urinary tract infection (UTI) is a frequent health-threatening condition. Early reliable diagnosis of UTI helps to prevent misuse or overuse of antibiotics and hence prevent antibiotic resistance. The gold standard for UTI diagnosis is urine culture which is a time-consuming and also an error prone method. In this regard, complementary methods are demanded. In the recent decade, machine learning strategies that employ mathematical models on a dataset to extract the most informative hidden information are the center of interest for prediction and diagnosis purposes.</p><p><strong>Method: </strong>In this study, machine learning approaches were used for finding the important variables for a reliable prediction of UTI. Several types of machines including classical and deep learning models were used for this purpose.</p><p><strong>Results: </strong>Eighteen selected features from urine test, blood test, and demographic data were found as the most informative features. Factors extracted from urine such as WBC, nitrite, leukocyte, clarity, color, blood, bilirubin, urobilinogen, and factors extracted from blood test like mean platelet volume, lymphocyte, glucose, red blood cell distribution width, and potassium, and demographic data such as age, gender and previous use of antibiotics were the determinative factors for UTI prediction. An ensemble combination of XGBoost, decision tree, and light gradient boosting machines with a voting scheme obtained the highest accuracy for UTI prediction (AUC: 88.53 (0.25), accuracy: 85.64 (0.20)%), according to the selected features. Furthermore, the results showed the importance of gender and age for UTI prediction.</p><p><strong>Conclusion: </strong>This study highlighted the potential of machine learning strategies for UTI prediction.</p>","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"13"},"PeriodicalIF":3.3,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11715496/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142945058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}