比较预测病毒失败的机器学习方法:使用电子健康记录数据的案例研究。

Statistical communications in infectious diseases Pub Date : 2020-11-12 eCollection Date: 2020-09-01 DOI:10.1515/scid-2019-0017
Allan Kimaina, Jonathan Dick, Allison DeLong, Stavroula A Chrysanthopoulou, Rami Kantor, Joseph W Hogan
{"title":"比较预测病毒失败的机器学习方法:使用电子健康记录数据的案例研究。","authors":"Allan Kimaina, Jonathan Dick, Allison DeLong, Stavroula A Chrysanthopoulou, Rami Kantor, Joseph W Hogan","doi":"10.1515/scid-2019-0017","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Human immunodeficiency virus (HIV) viral failure occurs when antiretroviral therapy fails to suppress and sustain a person's viral load count below 1,000 copies of viral ribonucleic acid per milliliter. For those newly diagnosed with HIV and living in a setting where healthcare resources are limited, such as a low- and middle-income country, the World Health Organization recommends viral load monitoring six months after initiation of antiretroviral treatment and yearly thereafter. Deviations from this schedule are made in cases where viral failure occurs or at the discretion of the clinician. Failure to detect viral failure in a timely fashion can lead to delayed administration of essential interventions. Clinical prediction models based on information available in the patient medical record are increasingly being developed and deployed for decision support in clinical medicine and public health. This raises the possibility that prediction models can be used to detect potential for viral failure in advance of viral measurements, particularly when those measurements occur infrequently.</p><p><strong>Objective: </strong>Our goal is to use electronic health record data from a large HIV care program in Kenya to characterize and compare the predictive accuracy of several statistical machine learning methods for predicting viral failure at the first and second measurements following initiation of antiretroviral therapy. Predictive accuracy is measured in terms of sensitivity, specificity and area under the receiver-operator characteristic curve.</p><p><strong>Methods: </strong>We trained and cross-validated 10 statistical machine learning models and algorithms on data from over 10,000 patients in the Academic Model Providing Access to Healthcare care program in western Kenya. These included parametric, non-parametric, ensemble, and Bayesian methods. The input variables included 50 items from the clinical record, hand picked in consultation with clinician experts. Predictive accuracy measures were calculated using 10-fold cross validation.</p><p><strong>Results: </strong>Viral load failure rate is about 20% in this patient cohort at both the first and second measurements. Ensemble techniques generally outperformed other methods. For predicting viral failure at the first follow up measure, specificity was over 90% for these methods, but sensitivity was typically in the 50-60% range. Predictive accuracy was greater for the second follow up measure, with sensitivities over 80%. Super Learner, gradient boosting and Bayesian additive regression trees consistently outperformed other methods. For a viral failure rate of 20%, the positive predictive value for the top-performing methods is between 75 and 85%, while the negative predictive value is over 95%.</p><p><strong>Conclusion: </strong>Evidence from this study suggests that machine learning techniques have potential to identify patients at risk for viral failure prior to their scheduled measurements. Ultimately, prognostic virologic assessment can help guide the administration of earlier targeted intervention such as enhanced drug resistance monitoring, rigorous adherence counseling, or appropriate next-line therapy switching. External validation studies should be used to confirm the results found here.</p>","PeriodicalId":74867,"journal":{"name":"Statistical communications in infectious diseases","volume":"12 Suppl1","pages":"20190017"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243177/pdf/scid-12-101-scid-2019-0017.pdf","citationCount":"0","resultStr":"{\"title\":\"Comparison of machine learning methods for predicting viral failure: a case study using electronic health record data.\",\"authors\":\"Allan Kimaina, Jonathan Dick, Allison DeLong, Stavroula A Chrysanthopoulou, Rami Kantor, Joseph W Hogan\",\"doi\":\"10.1515/scid-2019-0017\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Human immunodeficiency virus (HIV) viral failure occurs when antiretroviral therapy fails to suppress and sustain a person's viral load count below 1,000 copies of viral ribonucleic acid per milliliter. For those newly diagnosed with HIV and living in a setting where healthcare resources are limited, such as a low- and middle-income country, the World Health Organization recommends viral load monitoring six months after initiation of antiretroviral treatment and yearly thereafter. Deviations from this schedule are made in cases where viral failure occurs or at the discretion of the clinician. Failure to detect viral failure in a timely fashion can lead to delayed administration of essential interventions. Clinical prediction models based on information available in the patient medical record are increasingly being developed and deployed for decision support in clinical medicine and public health. This raises the possibility that prediction models can be used to detect potential for viral failure in advance of viral measurements, particularly when those measurements occur infrequently.</p><p><strong>Objective: </strong>Our goal is to use electronic health record data from a large HIV care program in Kenya to characterize and compare the predictive accuracy of several statistical machine learning methods for predicting viral failure at the first and second measurements following initiation of antiretroviral therapy. Predictive accuracy is measured in terms of sensitivity, specificity and area under the receiver-operator characteristic curve.</p><p><strong>Methods: </strong>We trained and cross-validated 10 statistical machine learning models and algorithms on data from over 10,000 patients in the Academic Model Providing Access to Healthcare care program in western Kenya. These included parametric, non-parametric, ensemble, and Bayesian methods. The input variables included 50 items from the clinical record, hand picked in consultation with clinician experts. Predictive accuracy measures were calculated using 10-fold cross validation.</p><p><strong>Results: </strong>Viral load failure rate is about 20% in this patient cohort at both the first and second measurements. Ensemble techniques generally outperformed other methods. For predicting viral failure at the first follow up measure, specificity was over 90% for these methods, but sensitivity was typically in the 50-60% range. Predictive accuracy was greater for the second follow up measure, with sensitivities over 80%. Super Learner, gradient boosting and Bayesian additive regression trees consistently outperformed other methods. For a viral failure rate of 20%, the positive predictive value for the top-performing methods is between 75 and 85%, while the negative predictive value is over 95%.</p><p><strong>Conclusion: </strong>Evidence from this study suggests that machine learning techniques have potential to identify patients at risk for viral failure prior to their scheduled measurements. Ultimately, prognostic virologic assessment can help guide the administration of earlier targeted intervention such as enhanced drug resistance monitoring, rigorous adherence counseling, or appropriate next-line therapy switching. External validation studies should be used to confirm the results found here.</p>\",\"PeriodicalId\":74867,\"journal\":{\"name\":\"Statistical communications in infectious diseases\",\"volume\":\"12 Suppl1\",\"pages\":\"20190017\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10243177/pdf/scid-12-101-scid-2019-0017.pdf\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Statistical communications in infectious diseases\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1515/scid-2019-0017\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2020/9/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Statistical communications in infectious diseases","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1515/scid-2019-0017","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2020/9/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

背景:当抗逆转录病毒疗法不能抑制和维持患者的病毒载量低于每毫升 1,000 拷贝病毒核糖核酸时,就会出现人体免疫缺陷病毒(HIV)病毒失效。对于那些新诊断出的艾滋病毒感染者和生活在医疗资源有限的环境中(如中低收入国家)的人,世界卫生组织建议在开始抗逆转录病毒治疗六个月后进行病毒载量监测,此后每年监测一次。在出现病毒衰竭的情况下,或由临床医生酌情决定是否偏离这一时间表。如果不能及时发现病毒衰竭,就可能导致延迟实施必要的干预措施。基于患者病历信息的临床预测模型正被越来越多地开发和应用于临床医学和公共卫生领域的决策支持。这就提出了一种可能性,即预测模型可用于在病毒测量之前检测出病毒失效的可能性,尤其是在病毒测量并不频繁进行的情况下:我们的目标是利用肯尼亚一个大型艾滋病护理项目的电子健康记录数据,描述并比较几种统计机器学习方法的预测准确性,以预测开始抗逆转录病毒疗法后第一次和第二次测量时的病毒失败。预测准确性以灵敏度、特异性和接收者-操作者特征曲线下的面积来衡量:我们对肯尼亚西部 "提供医疗保健服务学术模式 "项目中 10,000 多名患者的数据进行了训练,并交叉验证了 10 种统计机器学习模型和算法。其中包括参数、非参数、集合和贝叶斯方法。输入变量包括临床记录中的 50 个项目,这些项目都是在咨询临床专家后亲自挑选的。预测准确度是通过 10 倍交叉验证计算得出的:结果:在该患者群中,第一次和第二次测量的病毒载量失败率约为 20%。组合技术的表现普遍优于其他方法。在预测第一次随访测量的病毒失败时,这些方法的特异性超过 90%,但灵敏度通常在 50-60% 之间。第二次随访测量的预测准确性更高,灵敏度超过 80%。超级学习器、梯度提升和贝叶斯加性回归树的表现始终优于其他方法。在病毒失败率为 20% 的情况下,表现最好的方法的阳性预测值在 75% 到 85% 之间,而阴性预测值则超过 95%:本研究的证据表明,机器学习技术有可能在预定测量之前识别出有病毒失败风险的患者。最终,预后病毒学评估可帮助指导采取更早的针对性干预措施,如加强耐药性监测、严格的依从性咨询或适当的下线疗法转换。应利用外部验证研究来确认本文的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Comparison of machine learning methods for predicting viral failure: a case study using electronic health record data.

Background: Human immunodeficiency virus (HIV) viral failure occurs when antiretroviral therapy fails to suppress and sustain a person's viral load count below 1,000 copies of viral ribonucleic acid per milliliter. For those newly diagnosed with HIV and living in a setting where healthcare resources are limited, such as a low- and middle-income country, the World Health Organization recommends viral load monitoring six months after initiation of antiretroviral treatment and yearly thereafter. Deviations from this schedule are made in cases where viral failure occurs or at the discretion of the clinician. Failure to detect viral failure in a timely fashion can lead to delayed administration of essential interventions. Clinical prediction models based on information available in the patient medical record are increasingly being developed and deployed for decision support in clinical medicine and public health. This raises the possibility that prediction models can be used to detect potential for viral failure in advance of viral measurements, particularly when those measurements occur infrequently.

Objective: Our goal is to use electronic health record data from a large HIV care program in Kenya to characterize and compare the predictive accuracy of several statistical machine learning methods for predicting viral failure at the first and second measurements following initiation of antiretroviral therapy. Predictive accuracy is measured in terms of sensitivity, specificity and area under the receiver-operator characteristic curve.

Methods: We trained and cross-validated 10 statistical machine learning models and algorithms on data from over 10,000 patients in the Academic Model Providing Access to Healthcare care program in western Kenya. These included parametric, non-parametric, ensemble, and Bayesian methods. The input variables included 50 items from the clinical record, hand picked in consultation with clinician experts. Predictive accuracy measures were calculated using 10-fold cross validation.

Results: Viral load failure rate is about 20% in this patient cohort at both the first and second measurements. Ensemble techniques generally outperformed other methods. For predicting viral failure at the first follow up measure, specificity was over 90% for these methods, but sensitivity was typically in the 50-60% range. Predictive accuracy was greater for the second follow up measure, with sensitivities over 80%. Super Learner, gradient boosting and Bayesian additive regression trees consistently outperformed other methods. For a viral failure rate of 20%, the positive predictive value for the top-performing methods is between 75 and 85%, while the negative predictive value is over 95%.

Conclusion: Evidence from this study suggests that machine learning techniques have potential to identify patients at risk for viral failure prior to their scheduled measurements. Ultimately, prognostic virologic assessment can help guide the administration of earlier targeted intervention such as enhanced drug resistance monitoring, rigorous adherence counseling, or appropriate next-line therapy switching. External validation studies should be used to confirm the results found here.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信