Philip J Johnson, Ehsan Bhatti, Hidenori Toyoda, Shan He
{"title":"Serologic Detection of Hepatocellular Carcinoma: Application of Machine Learning and Implications for Diagnostic Models.","authors":"Philip J Johnson, Ehsan Bhatti, Hidenori Toyoda, Shan He","doi":"10.1200/CCI.23.00199","DOIUrl":null,"url":null,"abstract":"<p><strong>Purpose: </strong>The gender, age, lens culinaris agglutinin-reactive fraction of alphafetoprotein, alphafetoprotein, des-gamma-carboxyprothrombin (GALAD) score is a biomarker-based statistical model for the serologic diagnosis of hepatocellular carcinoma (HCC) that has been developed and validated using the case-control approach with a view to early detection. Performance has, however, been suboptimal in the first prospective studies which better reflect the real-world situation. In this article, we report the application of machine learning to a large, prospectively accrued, HCC surveillance data set.</p><p><strong>Patients and methods: </strong>Models were built on a cohort of 3,473 patients with chronic liver disease within a rigorous surveillance program between 1998 and 2014, during which 459 patients with HCC were detected. Two random forest (RF) models were trained. The first RF model uses the same variables as the original GALAD model (GALAD-RF); the second is based on routinely available clinical and laboratory features (RF-practical). For comparison, we evaluated a logistic regression GALAD model trained on this longitudinal prospective data set (termed GALAD-Ogaki).</p><p><strong>Results: </strong>Models were evaluated using a repetitive cross-validation approach with the metrics averaged over 100 independent runs. As judged by area under the receiver operator curve (AUROC) and F1 score, the GALAD RF model significantly outperformed the original GALAD model. The RF-practical model also outperformed the original GALAD model in terms of both AUROC and F1 score, and both models outperformed the individual biomarkers. An online web application that implemented the GALAD-RF and RF-practical models is presented.</p><p><strong>Conclusion: </strong>RF-based models improve on the diagnostic performance of the original GALAD model in the setting of a standard HCC surveillance program. Further prospective validation studies are warranted using these models and could be expanded to offer prediction of risk of HCC development over defined periods of time.</p>","PeriodicalId":51626,"journal":{"name":"JCO Clinical Cancer Informatics","volume":null,"pages":null},"PeriodicalIF":3.3000,"publicationDate":"2024-03-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JCO Clinical Cancer Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1200/CCI.23.00199","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ONCOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Purpose: The gender, age, lens culinaris agglutinin-reactive fraction of alphafetoprotein, alphafetoprotein, des-gamma-carboxyprothrombin (GALAD) score is a biomarker-based statistical model for the serologic diagnosis of hepatocellular carcinoma (HCC) that has been developed and validated using the case-control approach with a view to early detection. Performance has, however, been suboptimal in the first prospective studies which better reflect the real-world situation. In this article, we report the application of machine learning to a large, prospectively accrued, HCC surveillance data set.
Patients and methods: Models were built on a cohort of 3,473 patients with chronic liver disease within a rigorous surveillance program between 1998 and 2014, during which 459 patients with HCC were detected. Two random forest (RF) models were trained. The first RF model uses the same variables as the original GALAD model (GALAD-RF); the second is based on routinely available clinical and laboratory features (RF-practical). For comparison, we evaluated a logistic regression GALAD model trained on this longitudinal prospective data set (termed GALAD-Ogaki).
Results: Models were evaluated using a repetitive cross-validation approach with the metrics averaged over 100 independent runs. As judged by area under the receiver operator curve (AUROC) and F1 score, the GALAD RF model significantly outperformed the original GALAD model. The RF-practical model also outperformed the original GALAD model in terms of both AUROC and F1 score, and both models outperformed the individual biomarkers. An online web application that implemented the GALAD-RF and RF-practical models is presented.
Conclusion: RF-based models improve on the diagnostic performance of the original GALAD model in the setting of a standard HCC surveillance program. Further prospective validation studies are warranted using these models and could be expanded to offer prediction of risk of HCC development over defined periods of time.