Development and Validation of a Machine Learning-Based Screening Algorithm to Predict High-Risk Hepatitis C Infection.

IF 3.8 4区医学 Q2 IMMUNOLOGY

Open Forum Infectious Diseases Pub Date : 2025-08-15 eCollection Date: 2025-08-01 DOI:10.1093/ofid/ofaf496

Suk-Chan Jang, Wei-Hsuan Lo-Ciganic, Pilar Hernandez-Con, Chanakan Jenjai, James Huang, Ashley Stultz, Shunhua Yan, Debbie L Wilson, Ashley Norse, Faheem W Guirgis, Robert L Cook, Christine Gage, Khoa A Nguyen, Patrick Hornes, Yonghui Wu, David R Nelson, Haesuk Park

{"title":"Development and Validation of a Machine Learning-Based Screening Algorithm to Predict High-Risk Hepatitis C Infection.","authors":"Suk-Chan Jang, Wei-Hsuan Lo-Ciganic, Pilar Hernandez-Con, Chanakan Jenjai, James Huang, Ashley Stultz, Shunhua Yan, Debbie L Wilson, Ashley Norse, Faheem W Guirgis, Robert L Cook, Christine Gage, Khoa A Nguyen, Patrick Hornes, Yonghui Wu, David R Nelson, Haesuk Park","doi":"10.1093/ofid/ofaf496","DOIUrl":null,"url":null,"abstract":"Background: Amid the opioid epidemic in the United States, hepatitis C virus (HCV) infections are rising, with one-third of individuals with infection unaware due to the asymptomatic nature. This study aimed to develop and validate a machine learning (ML)-based algorithm to screen individuals at high risk of HCV infection.Methods: We conducted prognostic modeling using the 2016-2023 OneFlorida+ database of all-payer electronic health records. The study included individuals aged ≥18 years who were tested for HCV antibodies, RNA, or genotype. We identified 275 features of HCV, including sociodemographic and clinical characteristics, during a 6-month period before the test result date. Four ML algorithms-elastic net (EN), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN)-were developed and validated to predict HCV infection. We stratified patients into deciles based on predicted risk.Results: Among 445 624 individuals, 11 823 (2.65%) tested positive for HCV. Training (75%) and validation (25%) samples had similar characteristics (mean, standard deviation age, 45 [16] years; 62.86% female; 54.43% White). The GBM model (C statistic, 0.916 [95% confidence interval = .911-.921]) outperformed the EN (0.885 [.879-.891]), RF (0.854 [.847-.861]), and DNN (0.908 [.903-.913]) models (P < .0001). Using the Youden index, GBM achieved 79.39% sensitivity and 89.08% specificity, identifying 1 positive HCV case per 6 tests. Among patients with HCV, 75.63% and 90.25% were captured in the top first and first to third risk deciles, respectively.Conclusions: ML algorithms effectively predicted and stratified HCV infection risk, offering a promising targeted screening tool for clinical settings.","PeriodicalId":19517,"journal":{"name":"Open Forum Infectious Diseases","volume":"12 8","pages":"ofaf496"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12378832/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Forum Infectious Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ofid/ofaf496","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: Amid the opioid epidemic in the United States, hepatitis C virus (HCV) infections are rising, with one-third of individuals with infection unaware due to the asymptomatic nature. This study aimed to develop and validate a machine learning (ML)-based algorithm to screen individuals at high risk of HCV infection.

Methods: We conducted prognostic modeling using the 2016-2023 OneFlorida+ database of all-payer electronic health records. The study included individuals aged ≥18 years who were tested for HCV antibodies, RNA, or genotype. We identified 275 features of HCV, including sociodemographic and clinical characteristics, during a 6-month period before the test result date. Four ML algorithms-elastic net (EN), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN)-were developed and validated to predict HCV infection. We stratified patients into deciles based on predicted risk.

Results: Among 445 624 individuals, 11 823 (2.65%) tested positive for HCV. Training (75%) and validation (25%) samples had similar characteristics (mean, standard deviation age, 45 [16] years; 62.86% female; 54.43% White). The GBM model (C statistic, 0.916 [95% confidence interval = .911-.921]) outperformed the EN (0.885 [.879-.891]), RF (0.854 [.847-.861]), and DNN (0.908 [.903-.913]) models (P < .0001). Using the Youden index, GBM achieved 79.39% sensitivity and 89.08% specificity, identifying 1 positive HCV case per 6 tests. Among patients with HCV, 75.63% and 90.25% were captured in the top first and first to third risk deciles, respectively.

Conclusions: ML algorithms effectively predicted and stratified HCV infection risk, offering a promising targeted screening tool for clinical settings.

Abstract Image

查看原文本刊更多论文

基于机器学习的预测高危丙型肝炎感染筛选算法的开发与验证。

背景：在美国阿片类药物流行期间，丙型肝炎病毒（HCV）感染正在上升，三分之一的感染者由于无症状而不知情。本研究旨在开发和验证一种基于机器学习（ML）的算法，以筛选HCV感染高危人群。方法：我们使用2016-2023年OneFlorida+全付款人电子健康记录数据库进行预后建模。该研究纳入了年龄≥18岁的HCV抗体、RNA或基因型检测个体。我们确定了275个HCV特征，包括社会人口学和临床特征，在测试结果公布前的6个月期间。四种机器学习算法——弹性网络（EN）、随机森林（RF）、梯度增强机（GBM）和深度神经网络（DNN）——被开发并验证用于预测HCV感染。我们根据预测的风险将患者分成十分位数。结果：445624人中HCV阳性11823人（2.65%）。训练样本（75%）和验证样本（25%）具有相似的特征（平均，标准差年龄为45岁，62.86%为女性，54.43%为白人）。GBM模型（C统计量为0.916[95%置信区间= .911-.921]）优于EN（0.885[.879-.891]）、RF（0.854[.847-.861]）和DNN（0.908[.903-.913]）模型（P < .0001）。使用约登指数，GBM的敏感性为79.39%，特异性为89.08%，每6次检测可识别1例HCV阳性病例。在HCV患者中，75.63%和90.25%分别被捕获在前1和1至3危险十分位数。结论：ML算法有效预测和分层HCV感染风险，为临床提供了一种有希望的靶向筛查工具。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Open Forum Infectious Diseases Medicine-Neurology (clinical)

CiteScore

6.70

自引率

4.80%

发文量

630

审稿时长

9 weeks

期刊介绍： Open Forum Infectious Diseases provides a global forum for the publication of clinical, translational, and basic research findings in a fully open access, online journal environment. The journal reflects the broad diversity of the field of infectious diseases, and focuses on the intersection of biomedical science and clinical practice, with a particular emphasis on knowledge that holds the potential to improve patient care in populations around the world. Fully peer-reviewed, OFID supports the international community of infectious diseases experts by providing a venue for articles that further the understanding of all aspects of infectious diseases.