机器学习在肯尼亚利用常规数据改善艾滋病毒筛查

IF 4.9 1区医学 Q2 IMMUNOLOGY

Journal of the International AIDS Society Pub Date : 2025-04-20 DOI:10.1002/jia2.26436

Jonathan D. Friedman, Jonathan M. Mwangi, Kennedy J. Muthoka, Benedette A. Otieno, Jacob O. Odhiambo, Frederick O. Miruka, Lilly M. Nyagah, Pascal M. Mwele, Edmon O. Obat, Gonza O. Omoro, Margaret M. Ndisha, Davies O. Kimanga

{"title":"机器学习在肯尼亚利用常规数据改善艾滋病毒筛查","authors":"Jonathan D. Friedman, Jonathan M. Mwangi, Kennedy J. Muthoka, Benedette A. Otieno, Jacob O. Odhiambo, Frederick O. Miruka, Lilly M. Nyagah, Pascal M. Mwele, Edmon O. Obat, Gonza O. Omoro, Margaret M. Ndisha, Davies O. Kimanga","doi":"10.1002/jia2.26436","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>Optimal use of HIV testing resources accelerates progress towards ending HIV as a global threat. In Kenya, current testing practices yield a 2.8% positivity rate for new diagnoses reported through the national HIV electronic medical record (EMR) system. Increasingly, researchers have explored the potential for machine learning to improve the identification of people with undiagnosed HIV for referral for HIV testing. However, few studies have used routinely collected programme data as the basis for implementing a real-time clinical decision support system to improve HIV screening. In this study, we applied machine learning to routine programme data from Kenya's EMR to predict the probability that an individual seeking care is undiagnosed HIV positive and should be prioritized for testing.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>We combined de-identified individual-level EMR data from 167,509 individuals without a previous HIV diagnosis who were tested between June and November 2022. We included demographics, clinical histories and HIV-relevant behavioural practices with open-source data that describes population-level behavioural practices as other variables in the model. We used multiple imputations to address high rates of missing data, selecting the optimal technique based on out-of-sample error. We generated a stratified 60-20-20 train-validate-test split to assess model generalizability. We trained four machine learning algorithms including logistic regression, Random Forest, AdaBoost and XGBoost. Models were evaluated using Area Under the Precision-Recall Curve (AUCPR), a metric that is well-suited to cases of class imbalance such as this, in which there are far more negative test results than positive.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>All model types demonstrated predictive performance on the test set with AUCPR that exceeded the current positivity rate. XGBoost generated the greatest AUCPR, 10.5 times greater than the rate of positive test results.</p>\n </section>\n \n <section>\n \n <h3> Conclusions</h3>\n \n <p>Our study demonstrated that machine learning applied to routine HIV testing data may be used as a clinical decision support tool to refer persons for HIV testing. The resulting model could be integrated in the screening form of an EMR and used as a real-time decision support tool to inform testing decisions. Although issues of data quality and missing data remained, these challenges could be addressed using sound data preparation techniques.</p>\n </section>\n </div>","PeriodicalId":201,"journal":{"name":"Journal of the International AIDS Society","volume":"28 4","pages":""},"PeriodicalIF":4.9000,"publicationDate":"2025-04-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jia2.26436","citationCount":"0","resultStr":"{\"title\":\"Machine learning to improve HIV screening using routine data in Kenya\",\"authors\":\"Jonathan D. Friedman, Jonathan M. Mwangi, Kennedy J. Muthoka, Benedette A. Otieno, Jacob O. Odhiambo, Frederick O. Miruka, Lilly M. Nyagah, Pascal M. Mwele, Edmon O. Obat, Gonza O. Omoro, Margaret M. Ndisha, Davies O. Kimanga\",\"doi\":\"10.1002/jia2.26436\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div>\\n \\n \\n <section>\\n \\n <h3> Introduction</h3>\\n \\n <p>Optimal use of HIV testing resources accelerates progress towards ending HIV as a global threat. In Kenya, current testing practices yield a 2.8% positivity rate for new diagnoses reported through the national HIV electronic medical record (EMR) system. Increasingly, researchers have explored the potential for machine learning to improve the identification of people with undiagnosed HIV for referral for HIV testing. However, few studies have used routinely collected programme data as the basis for implementing a real-time clinical decision support system to improve HIV screening. In this study, we applied machine learning to routine programme data from Kenya's EMR to predict the probability that an individual seeking care is undiagnosed HIV positive and should be prioritized for testing.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Methods</h3>\\n \\n <p>We combined de-identified individual-level EMR data from 167,509 individuals without a previous HIV diagnosis who were tested between June and November 2022. We included demographics, clinical histories and HIV-relevant behavioural practices with open-source data that describes population-level behavioural practices as other variables in the model. We used multiple imputations to address high rates of missing data, selecting the optimal technique based on out-of-sample error. We generated a stratified 60-20-20 train-validate-test split to assess model generalizability. We trained four machine learning algorithms including logistic regression, Random Forest, AdaBoost and XGBoost. Models were evaluated using Area Under the Precision-Recall Curve (AUCPR), a metric that is well-suited to cases of class imbalance such as this, in which there are far more negative test results than positive.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Results</h3>\\n \\n <p>All model types demonstrated predictive performance on the test set with AUCPR that exceeded the current positivity rate. XGBoost generated the greatest AUCPR, 10.5 times greater than the rate of positive test results.</p>\\n </section>\\n \\n <section>\\n \\n <h3> Conclusions</h3>\\n \\n <p>Our study demonstrated that machine learning applied to routine HIV testing data may be used as a clinical decision support tool to refer persons for HIV testing. The resulting model could be integrated in the screening form of an EMR and used as a real-time decision support tool to inform testing decisions. Although issues of data quality and missing data remained, these challenges could be addressed using sound data preparation techniques.</p>\\n </section>\\n </div>\",\"PeriodicalId\":201,\"journal\":{\"name\":\"Journal of the International AIDS Society\",\"volume\":\"28 4\",\"pages\":\"\"},\"PeriodicalIF\":4.9000,\"publicationDate\":\"2025-04-20\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jia2.26436\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of the International AIDS Society\",\"FirstCategoryId\":\"3\",\"ListUrlMain\":\"https://onlinelibrary.wiley.com/doi/10.1002/jia2.26436\",\"RegionNum\":1,\"RegionCategory\":\"医学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"IMMUNOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of the International AIDS Society","FirstCategoryId":"3","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jia2.26436","RegionNum":1,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}

引用次数: 0

摘要

导言艾滋病检测资源的优化利用可加快消除艾滋病毒这一全球性威胁的进程。在肯尼亚，通过国家艾滋病电子病历（EMR）系统报告的新诊断结果中，目前的检测方法产生的阳性率为 2.8%。越来越多的研究人员探索了机器学习的潜力，以改进对未确诊艾滋病毒感染者的识别，从而转介进行艾滋病毒检测。然而，很少有研究以常规收集的项目数据为基础，实施实时临床决策支持系统来改进艾滋病筛查。在本研究中，我们将机器学习应用于肯尼亚 EMR 中的常规项目数据，以预测就诊者中未确诊为 HIV 阳性并应优先进行检测的概率。方法我们合并了 167,509 名在 2022 年 6 月至 11 月间接受过检测的既往未确诊过 HIV 的个体的去标识化 EMR 数据。我们将人口统计学、临床病史和与 HIV 相关的行为习惯与描述人群行为习惯的开源数据作为其他变量纳入模型。我们根据样本外误差选择了最佳技术，使用多重推定来解决数据缺失率高的问题。我们对训练-验证-测试进行了 60-20-20 的分层，以评估模型的普适性。我们训练了四种机器学习算法，包括逻辑回归、随机森林、AdaBoost 和 XGBoost。我们使用精度-召回曲线下面积（AUCPR）对模型进行了评估，该指标非常适合像这种类不平衡的情况，在这种情况下，负面测试结果远远多于正面结果。结果所有模型类型在测试集上都表现出了预测性能，AUCPR 超过了当前的正向率。XGBoost 产生的 AUCPR 最大，是正向测试结果率的 10.5 倍。结论我们的研究表明，将机器学习应用于常规 HIV 检测数据可作为临床决策支持工具，用于转介患者进行 HIV 检测。由此产生的模型可以集成到 EMR 的筛查表格中，并作为实时决策支持工具为检测决策提供信息。虽然数据质量和缺失数据问题依然存在，但这些挑战可以通过合理的数据准备技术加以解决。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Machine learning to improve HIV screening using routine data in Kenya

查看原文本刊更多论文

Machine learning to improve HIV screening using routine data in Kenya

Introduction

Optimal use of HIV testing resources accelerates progress towards ending HIV as a global threat. In Kenya, current testing practices yield a 2.8% positivity rate for new diagnoses reported through the national HIV electronic medical record (EMR) system. Increasingly, researchers have explored the potential for machine learning to improve the identification of people with undiagnosed HIV for referral for HIV testing. However, few studies have used routinely collected programme data as the basis for implementing a real-time clinical decision support system to improve HIV screening. In this study, we applied machine learning to routine programme data from Kenya's EMR to predict the probability that an individual seeking care is undiagnosed HIV positive and should be prioritized for testing.

Methods

We combined de-identified individual-level EMR data from 167,509 individuals without a previous HIV diagnosis who were tested between June and November 2022. We included demographics, clinical histories and HIV-relevant behavioural practices with open-source data that describes population-level behavioural practices as other variables in the model. We used multiple imputations to address high rates of missing data, selecting the optimal technique based on out-of-sample error. We generated a stratified 60-20-20 train-validate-test split to assess model generalizability. We trained four machine learning algorithms including logistic regression, Random Forest, AdaBoost and XGBoost. Models were evaluated using Area Under the Precision-Recall Curve (AUCPR), a metric that is well-suited to cases of class imbalance such as this, in which there are far more negative test results than positive.

Results

All model types demonstrated predictive performance on the test set with AUCPR that exceeded the current positivity rate. XGBoost generated the greatest AUCPR, 10.5 times greater than the rate of positive test results.

Conclusions

Our study demonstrated that machine learning applied to routine HIV testing data may be used as a clinical decision support tool to refer persons for HIV testing. The resulting model could be integrated in the screening form of an EMR and used as a real-time decision support tool to inform testing decisions. Although issues of data quality and missing data remained, these challenges could be addressed using sound data preparation techniques.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Journal of the International AIDS Society IMMUNOLOGY-INFECTIOUS DISEASES

CiteScore

8.60

自引率

10.00%

发文量

186

审稿时长

>12 weeks

期刊介绍： The Journal of the International AIDS Society (JIAS) is a peer-reviewed and Open Access journal for the generation and dissemination of evidence from a wide range of disciplines: basic and biomedical sciences; behavioural sciences; epidemiology; clinical sciences; health economics and health policy; operations research and implementation sciences; and social sciences and humanities. Submission of HIV research carried out in low- and middle-income countries is strongly encouraged.