Suk-Chan Jang, Wei-Hsuan Lo-Ciganic, Pilar Hernandez-Con, Chanakan Jenjai, James Huang, Ashley Stultz, Shunhua Yan, Debbie L Wilson, Ashley Norse, Faheem W Guirgis, Robert L Cook, Christine Gage, Khoa A Nguyen, Patrick Hornes, Yonghui Wu, David R Nelson, Haesuk Park
{"title":"Development and Validation of a Machine Learning-Based Screening Algorithm to Predict High-Risk Hepatitis C Infection.","authors":"Suk-Chan Jang, Wei-Hsuan Lo-Ciganic, Pilar Hernandez-Con, Chanakan Jenjai, James Huang, Ashley Stultz, Shunhua Yan, Debbie L Wilson, Ashley Norse, Faheem W Guirgis, Robert L Cook, Christine Gage, Khoa A Nguyen, Patrick Hornes, Yonghui Wu, David R Nelson, Haesuk Park","doi":"10.1093/ofid/ofaf496","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Amid the opioid epidemic in the United States, hepatitis C virus (HCV) infections are rising, with one-third of individuals with infection unaware due to the asymptomatic nature. This study aimed to develop and validate a machine learning (ML)-based algorithm to screen individuals at high risk of HCV infection.</p><p><strong>Methods: </strong>We conducted prognostic modeling using the 2016-2023 OneFlorida+ database of all-payer electronic health records. The study included individuals aged ≥18 years who were tested for HCV antibodies, RNA, or genotype. We identified 275 features of HCV, including sociodemographic and clinical characteristics, during a 6-month period before the test result date. Four ML algorithms-elastic net (EN), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN)-were developed and validated to predict HCV infection. We stratified patients into deciles based on predicted risk.</p><p><strong>Results: </strong>Among 445 624 individuals, 11 823 (2.65%) tested positive for HCV. Training (75%) and validation (25%) samples had similar characteristics (mean, standard deviation age, 45 [16] years; 62.86% female; 54.43% White). The GBM model (<i>C</i> statistic, 0.916 [95% confidence interval = .911-.921]) outperformed the EN (0.885 [.879-.891]), RF (0.854 [.847-.861]), and DNN (0.908 [.903-.913]) models (<i>P</i> < .0001). Using the Youden index, GBM achieved 79.39% sensitivity and 89.08% specificity, identifying 1 positive HCV case per 6 tests. Among patients with HCV, 75.63% and 90.25% were captured in the top first and first to third risk deciles, respectively.</p><p><strong>Conclusions: </strong>ML algorithms effectively predicted and stratified HCV infection risk, offering a promising targeted screening tool for clinical settings.</p>","PeriodicalId":19517,"journal":{"name":"Open Forum Infectious Diseases","volume":"12 8","pages":"ofaf496"},"PeriodicalIF":3.8000,"publicationDate":"2025-08-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12378832/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Open Forum Infectious Diseases","FirstCategoryId":"3","ListUrlMain":"https://doi.org/10.1093/ofid/ofaf496","RegionNum":4,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"IMMUNOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Background: Amid the opioid epidemic in the United States, hepatitis C virus (HCV) infections are rising, with one-third of individuals with infection unaware due to the asymptomatic nature. This study aimed to develop and validate a machine learning (ML)-based algorithm to screen individuals at high risk of HCV infection.
Methods: We conducted prognostic modeling using the 2016-2023 OneFlorida+ database of all-payer electronic health records. The study included individuals aged ≥18 years who were tested for HCV antibodies, RNA, or genotype. We identified 275 features of HCV, including sociodemographic and clinical characteristics, during a 6-month period before the test result date. Four ML algorithms-elastic net (EN), random forest (RF), gradient boosting machine (GBM), and deep neural network (DNN)-were developed and validated to predict HCV infection. We stratified patients into deciles based on predicted risk.
Results: Among 445 624 individuals, 11 823 (2.65%) tested positive for HCV. Training (75%) and validation (25%) samples had similar characteristics (mean, standard deviation age, 45 [16] years; 62.86% female; 54.43% White). The GBM model (C statistic, 0.916 [95% confidence interval = .911-.921]) outperformed the EN (0.885 [.879-.891]), RF (0.854 [.847-.861]), and DNN (0.908 [.903-.913]) models (P < .0001). Using the Youden index, GBM achieved 79.39% sensitivity and 89.08% specificity, identifying 1 positive HCV case per 6 tests. Among patients with HCV, 75.63% and 90.25% were captured in the top first and first to third risk deciles, respectively.
Conclusions: ML algorithms effectively predicted and stratified HCV infection risk, offering a promising targeted screening tool for clinical settings.
期刊介绍:
Open Forum Infectious Diseases provides a global forum for the publication of clinical, translational, and basic research findings in a fully open access, online journal environment. The journal reflects the broad diversity of the field of infectious diseases, and focuses on the intersection of biomedical science and clinical practice, with a particular emphasis on knowledge that holds the potential to improve patient care in populations around the world. Fully peer-reviewed, OFID supports the international community of infectious diseases experts by providing a venue for articles that further the understanding of all aspects of infectious diseases.