Development of a machine learning prediction model for loss to follow-up in HIV care using routine electronic medical records in a low-resource setting.

IF 3.3 3区医学 Q2 MEDICAL INFORMATICS

BMC Medical Informatics and Decision Making Pub Date : 2025-05-19 DOI:10.1186/s12911-025-03030-7

Tamrat Endebu, Girma Taye, Wakgari Deressa

{"title":"Development of a machine learning prediction model for loss to follow-up in HIV care using routine electronic medical records in a low-resource setting.","authors":"Tamrat Endebu, Girma Taye, Wakgari Deressa","doi":"10.1186/s12911-025-03030-7","DOIUrl":null,"url":null,"abstract":"Background: Despite the global commitment to ending AIDS by 2030, the loss of follow-up (LTFU) in HIV care remains a significant challenge. To address this issue, a data-driven clinical decision tool is crucial for identifying patients at greater risk of LTFU and facilitating personalized and proactive interventions. This study aimed to develop a prediction model to assess the future risk of LTFU in HIV care in Ethiopia.Methods: The study used a retrospective design in which machine learning (ML) methods were applied to the electronic medical records (EMRs) data of adult HIV-positive individuals who were newly enrolled in antiretroviral therapy between July 2019 and April 2024. The data were collected across eight randomly selected high-volume healthcare facilities. Six supervised ML classifiers-J48 decision tree, random forest, K-nearest neighbors, support vector machine, logistic regression, and naïve Bayes-were utilized for training via Weka 3.8.6 software. The performance of each algorithm was evaluated through a 10-fold cross-validation approach. Algorithm performance was compared via the corrected resampled t test (p < 0.05), and decision curve analysis (DCA) was used to assess the model's clinical utility.Results: A total of 3,720 individuals' EMR data were analyzed, with 2,575 (69.2%) classified as not LTFU and 1,145 (30.8%) classified as LTFU. On the basis of the ML feature selection process, six strong predictors of LTFU were identified: differentiated service delivery model, adherence, tuberculosis preventive therapy, follow-up period, nutritional status, and address information. The random forest algorithm showed superior performance, with an accuracy of 84.2%, a sensitivity of 82.4%, a specificity of 85.7%, a precision of 83.7%, an F1 score of 83.1%, and an area under the curve of 89.5%. The model demonstrated greater clinical utility, offering greater net benefit than both the 'intervention for all' approach and the 'intervention for none' approach, particularly at threshold probabilities of 10% and above.Conclusions: This study developed a machine learning-based predictive model for assessing the future risk of LTFU in HIV care within low-resource settings. Notably, the model built via the random forest algorithm exhibited high accuracy and strong discriminative performance, highlighting its positive net benefit for clinical applications. Furthermore, ongoing external validation across diverse populations is important to ensure the model's reliability and generalizability.","PeriodicalId":9340,"journal":{"name":"BMC Medical Informatics and Decision Making","volume":"25 1","pages":"192"},"PeriodicalIF":3.3000,"publicationDate":"2025-05-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12090508/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Medical Informatics and Decision Making","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1186/s12911-025-03030-7","RegionNum":3,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MEDICAL INFORMATICS","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Despite the global commitment to ending AIDS by 2030, the loss of follow-up (LTFU) in HIV care remains a significant challenge. To address this issue, a data-driven clinical decision tool is crucial for identifying patients at greater risk of LTFU and facilitating personalized and proactive interventions. This study aimed to develop a prediction model to assess the future risk of LTFU in HIV care in Ethiopia.

Methods: The study used a retrospective design in which machine learning (ML) methods were applied to the electronic medical records (EMRs) data of adult HIV-positive individuals who were newly enrolled in antiretroviral therapy between July 2019 and April 2024. The data were collected across eight randomly selected high-volume healthcare facilities. Six supervised ML classifiers-J48 decision tree, random forest, K-nearest neighbors, support vector machine, logistic regression, and naïve Bayes-were utilized for training via Weka 3.8.6 software. The performance of each algorithm was evaluated through a 10-fold cross-validation approach. Algorithm performance was compared via the corrected resampled t test (p < 0.05), and decision curve analysis (DCA) was used to assess the model's clinical utility.

Results: A total of 3,720 individuals' EMR data were analyzed, with 2,575 (69.2%) classified as not LTFU and 1,145 (30.8%) classified as LTFU. On the basis of the ML feature selection process, six strong predictors of LTFU were identified: differentiated service delivery model, adherence, tuberculosis preventive therapy, follow-up period, nutritional status, and address information. The random forest algorithm showed superior performance, with an accuracy of 84.2%, a sensitivity of 82.4%, a specificity of 85.7%, a precision of 83.7%, an F1 score of 83.1%, and an area under the curve of 89.5%. The model demonstrated greater clinical utility, offering greater net benefit than both the 'intervention for all' approach and the 'intervention for none' approach, particularly at threshold probabilities of 10% and above.

Conclusions: This study developed a machine learning-based predictive model for assessing the future risk of LTFU in HIV care within low-resource settings. Notably, the model built via the random forest algorithm exhibited high accuracy and strong discriminative performance, highlighting its positive net benefit for clinical applications. Furthermore, ongoing external validation across diverse populations is important to ensure the model's reliability and generalizability.

查看原文本刊更多论文

开发一种机器学习预测模型，用于在低资源环境下使用常规电子医疗记录进行艾滋病毒护理的随访损失。

背景：尽管全球承诺到2030年终结艾滋病，但艾滋病毒护理中的随访缺失仍然是一个重大挑战。为了解决这一问题，数据驱动的临床决策工具对于识别LTFU风险较大的患者和促进个性化和主动干预至关重要。本研究旨在建立一个预测模型来评估埃塞俄比亚HIV护理中LTFU的未来风险。方法：该研究采用回顾性设计，将机器学习（ML）方法应用于2019年7月至2024年4月期间新入组抗逆转录病毒治疗的成年艾滋病毒阳性个体的电子病历（emr）数据。这些数据是在随机选择的8个大容量医疗机构中收集的。利用j48决策树、随机森林、k近邻、支持向量机、逻辑回归和naïve贝叶斯6个监督式ML分类器，通过Weka 3.8.6软件进行训练。每个算法的性能通过10倍交叉验证方法进行评估。结果：共分析了3720个个体的EMR数据，其中2575个（69.2%）被归类为非LTFU， 1145个（30.8%）被归类为LTFU。在ML特征选择过程的基础上，确定了LTFU的六个强预测因子：差异化服务提供模式、依从性、结核病预防治疗、随访期、营养状况和地址信息。随机森林算法的准确率为84.2%，灵敏度为82.4%，特异性为85.7%，精度为83.7%，F1评分为83.1%，曲线下面积为89.5%。该模型显示出更大的临床效用，比“所有干预”方法和“不干预”方法提供更大的净效益，特别是在阈值概率为10%及以上时。结论：本研究开发了一种基于机器学习的预测模型，用于评估低资源环境下HIV护理中LTFU的未来风险。值得注意的是，通过随机森林算法建立的模型具有较高的准确率和较强的判别性能，突出了其对临床应用的积极净效益。此外，在不同人群中进行的外部验证对于确保模型的可靠性和泛化性非常重要。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

BMC Medical Informatics and Decision Making 医学-医学：信息

CiteScore

7.20

自引率

5.70%

发文量

297

审稿时长

1 months

期刊介绍： BMC Medical Informatics and Decision Making is an open access journal publishing original peer-reviewed research articles in relation to the design, development, implementation, use, and evaluation of health information technologies and decision-making for human health.