Development and Validation of a Machine Learning-Based Screening Algorithm to Predict High-Risk Hepatitis C Infection.

IF 3.8 4区 医学 Q2 IMMUNOLOGY
Open Forum Infectious Diseases Pub Date : 2025-08-15 eCollection Date: 2025-08-01 DOI:10.1093/ofid/ofaf496
Suk-Chan Jang, Wei-Hsuan Lo-Ciganic, Pilar Hernandez-Con, Chanakan Jenjai, James Huang, Ashley Stultz, Shunhua Yan, Debbie L Wilson, Ashley Norse, Faheem W Guirgis, Robert L Cook, Christine Gage, Khoa A Nguyen, Patrick Hornes, Yonghui Wu, David R Nelson, Haesuk Park
{"title":"Development and Validation of a Machine Learning-Based Screening Algorithm to Predict High-Risk Hepatitis C Infection.","authors":"Suk-Chan Jang, Wei-Hsuan Lo-Ciganic, Pilar Hernandez-Con, Chanakan Jenjai, James Huang, Ashley Stultz, Shunhua Yan, Debbie L Wilson, Ashley Norse, Faheem W Guirgis, Robert L Cook, Christine Gage, Khoa A Nguyen, Patrick Hornes, Yonghui Wu, David R Nelson, Haesuk Park","doi":"10.1093/ofid/ofaf496","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Amid the opioid epidemic in the United States, hepatitis C virus (HCV) infections are rising, with one-third of individuals with infection unaware due to the asymptomatic nature. This study aimed to develop and validate a machine learning (ML)-based algorithm to screen individuals at high risk of HCV infection.</p><p><strong>Methods: </strong>We conducted prognostic modeling using the 2016-2023 OneFlorida+ database of all-payer electronic health records. The study included individuals aged ≥18 years who were tested for HCV antibodies, RNA, or genotype. We identified 275 features of HCV, including sociodemographic and clinical characteristics, during a 6-month period before the test result date. Four ML algorithms-elastic net (EN), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN)-were developed and validated to predict HCV infection. We stratified patients into deciles based on predicted risk.</p><p><strong>Results: </strong>Among 445 624 individuals, 11 823 (2.65%) tested positive for HCV. Training (75%) and validation (25%) samples had similar characteristics (mean, standard deviation age, 45 [16] years; 62.86% female; 54.43% White). The GBM model (<i>C</i> statistic, 0.916 [95% confidence interval = .911-.921]) outperformed the EN (0.885 [.879-.891]), RF (0.854 [.847-.861]), and DNN (0.908 [.903-.913]) models (<i>P</i> < .0001). Using the Youden index, GBM achieved 79.39% sensitivity and 89.08% specificity, identifying 1 positive HCV case per 6 tests. Among patients with HCV, 75.63% and 90.25% were captured in the top first and first to third risk deciles, respectively.</p><p><strong>Conclusions: </strong>ML algorithms effectively predicted and stratified HCV infection risk, offering a promising targeted screening tool for clinical settings.</p>","PeriodicalId":19517,"journal":{"name":"Open Forum Infectious Diseases","volume":"12 8","pages":"ofaf496"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12378832/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Forum Infectious Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ofid/ofaf496","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Amid the opioid epidemic in the United States, hepatitis C virus (HCV) infections are rising, with one-third of individuals with infection unaware due to the asymptomatic nature. This study aimed to develop and validate a machine learning (ML)-based algorithm to screen individuals at high risk of HCV infection.

Methods: We conducted prognostic modeling using the 2016-2023 OneFlorida+ database of all-payer electronic health records. The study included individuals aged ≥18 years who were tested for HCV antibodies, RNA, or genotype. We identified 275 features of HCV, including sociodemographic and clinical characteristics, during a 6-month period before the test result date. Four ML algorithms-elastic net (EN), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN)-were developed and validated to predict HCV infection. We stratified patients into deciles based on predicted risk.

Results: Among 445 624 individuals, 11 823 (2.65%) tested positive for HCV. Training (75%) and validation (25%) samples had similar characteristics (mean, standard deviation age, 45 [16] years; 62.86% female; 54.43% White). The GBM model (C statistic, 0.916 [95% confidence interval = .911-.921]) outperformed the EN (0.885 [.879-.891]), RF (0.854 [.847-.861]), and DNN (0.908 [.903-.913]) models (P < .0001). Using the Youden index, GBM achieved 79.39% sensitivity and 89.08% specificity, identifying 1 positive HCV case per 6 tests. Among patients with HCV, 75.63% and 90.25% were captured in the top first and first to third risk deciles, respectively.

Conclusions: ML algorithms effectively predicted and stratified HCV infection risk, offering a promising targeted screening tool for clinical settings.

Abstract Image

Abstract Image

Abstract Image

基于机器学习的预测高危丙型肝炎感染筛选算法的开发与验证。
背景:在美国阿片类药物流行期间,丙型肝炎病毒(HCV)感染正在上升,三分之一的感染者由于无症状而不知情。本研究旨在开发和验证一种基于机器学习(ML)的算法,以筛选HCV感染高危人群。方法:我们使用2016-2023年OneFlorida+全付款人电子健康记录数据库进行预后建模。该研究纳入了年龄≥18岁的HCV抗体、RNA或基因型检测个体。我们确定了275个HCV特征,包括社会人口学和临床特征,在测试结果公布前的6个月期间。四种机器学习算法——弹性网络(EN)、随机森林(RF)、梯度增强机(GBM)和深度神经网络(DNN)——被开发并验证用于预测HCV感染。我们根据预测的风险将患者分成十分位数。结果:445624人中HCV阳性11823人(2.65%)。训练样本(75%)和验证样本(25%)具有相似的特征(平均,标准差年龄为45岁,62.86%为女性,54.43%为白人)。GBM模型(C统计量为0.916[95%置信区间= .911-.921])优于EN(0.885[.879-.891])、RF(0.854[.847-.861])和DNN(0.908[.903-.913])模型(P < .0001)。使用约登指数,GBM的敏感性为79.39%,特异性为89.08%,每6次检测可识别1例HCV阳性病例。在HCV患者中,75.63%和90.25%分别被捕获在前1和1至3危险十分位数。结论:ML算法有效预测和分层HCV感染风险,为临床提供了一种有希望的靶向筛查工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Open Forum Infectious Diseases
Open Forum Infectious Diseases Medicine-Neurology (clinical)
CiteScore
6.70
自引率
4.80%
发文量
630
审稿时长
9 weeks
期刊介绍: Open Forum Infectious Diseases provides a global forum for the publication of clinical, translational, and basic research findings in a fully open access, online journal environment. The journal reflects the broad diversity of the field of infectious diseases, and focuses on the intersection of biomedical science and clinical practice, with a particular emphasis on knowledge that holds the potential to improve patient care in populations around the world. Fully peer-reviewed, OFID supports the international community of infectious diseases experts by providing a venue for articles that further the understanding of all aspects of infectious diseases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信