Degninou Yehadji, Geraldine Gray, Carlos Arias Vicente, Petros Isaakidis, Abdourahimi Diallo, Saa Andre Kamano, Thierno Saidou Diallo
{"title":"Development of machine learning algorithms to predict viral load suppression among HIV patients in Conakry (Guinea).","authors":"Degninou Yehadji, Geraldine Gray, Carlos Arias Vicente, Petros Isaakidis, Abdourahimi Diallo, Saa Andre Kamano, Thierno Saidou Diallo","doi":"10.3389/frai.2025.1446876","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Viral load (VL) suppression is key to ending the global HIV epidemic, and predicting it is critical for healthcare providers and people living with HIV (PLHIV). Traditional research has focused on statistical analysis, but machine learning (ML) is gradually influencing HIV clinical care. While ML has been used in various settings, there's a lack of research supporting antiretroviral therapy (ART) programs, especially in resource-limited settings like Guinea. This study aims to identify the most predictive variables of VL suppression and develop ML models for PLHIV in Conakry (Guinea).</p><p><strong>Methods: </strong>Anonymized data from HIV patients in eight Conakry health facilities were pre-processed, including variable recoding, record removal, missing value imputation, grouping small categories, creating dummy variables, and oversampling the smallest target class. Support vector machine (SVM), logistic regression (LR), naïve Bayes (NB), random forest (RF), and four stacked models were developed. Optimal parameters were determined through two cross-validation loops using a grid search approach. Sensitivity, specificity, predictive positive value (PPV), predictive negative value (PNV), <i>F</i>-score, and area under the curve (AUC) were computed on unseen data to assess model performance. RF was used to determine the most predictive variables.</p><p><strong>Results: </strong>RF (94% <i>F</i>-score, 82% AUC) and NB (89% <i>F</i>-score, 82% AUC) were the most optimal models to detect VL suppression and non-suppression when applied to unseen data. The optimal parameters for RF were 1,000 estimators and no maximum depth (Random state = 40), and it identified Regimen schedule_6-Month, Duration on ART (months), Last ART CD4, Regimen schedule_Regular, and Last Pre-ART CD4 as top predictors for VL suppression.</p><p><strong>Conclusion: </strong>This study demonstrated the capability to predict VL suppression but has some limitations. The results are dependent on the quality of the data and are specific to the Guinea context and thus, there may be limitations with generalizability. Future studies may be to conduct a similar study in a different context and develop the most optimal model into an application that can be tested in a clinical context.</p>","PeriodicalId":33315,"journal":{"name":"Frontiers in Artificial Intelligence","volume":"8 ","pages":"1446876"},"PeriodicalIF":3.0000,"publicationDate":"2025-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11961888/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Frontiers in Artificial Intelligence","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3389/frai.2025.1446876","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Viral load (VL) suppression is key to ending the global HIV epidemic, and predicting it is critical for healthcare providers and people living with HIV (PLHIV). Traditional research has focused on statistical analysis, but machine learning (ML) is gradually influencing HIV clinical care. While ML has been used in various settings, there's a lack of research supporting antiretroviral therapy (ART) programs, especially in resource-limited settings like Guinea. This study aims to identify the most predictive variables of VL suppression and develop ML models for PLHIV in Conakry (Guinea).
Methods: Anonymized data from HIV patients in eight Conakry health facilities were pre-processed, including variable recoding, record removal, missing value imputation, grouping small categories, creating dummy variables, and oversampling the smallest target class. Support vector machine (SVM), logistic regression (LR), naïve Bayes (NB), random forest (RF), and four stacked models were developed. Optimal parameters were determined through two cross-validation loops using a grid search approach. Sensitivity, specificity, predictive positive value (PPV), predictive negative value (PNV), F-score, and area under the curve (AUC) were computed on unseen data to assess model performance. RF was used to determine the most predictive variables.
Results: RF (94% F-score, 82% AUC) and NB (89% F-score, 82% AUC) were the most optimal models to detect VL suppression and non-suppression when applied to unseen data. The optimal parameters for RF were 1,000 estimators and no maximum depth (Random state = 40), and it identified Regimen schedule_6-Month, Duration on ART (months), Last ART CD4, Regimen schedule_Regular, and Last Pre-ART CD4 as top predictors for VL suppression.
Conclusion: This study demonstrated the capability to predict VL suppression but has some limitations. The results are dependent on the quality of the data and are specific to the Guinea context and thus, there may be limitations with generalizability. Future studies may be to conduct a similar study in a different context and develop the most optimal model into an application that can be tested in a clinical context.