The Visualization of the Importance of Covariance Importance in a Machine Learning Model for Advanced Liver Fibrosis in a Nationally Representative Sample
{"title":"The Visualization of the Importance of Covariance Importance in a Machine Learning Model for Advanced Liver Fibrosis in a Nationally Representative Sample","authors":"Alexander A. Huang, Samuel Y. Huang","doi":"10.1002/jgh3.70200","DOIUrl":null,"url":null,"abstract":"<div>\n \n \n <section>\n \n <h3> Introduction</h3>\n \n <p>Accurate prediction of liver disease is vital for early intervention, given its potential severity. This study aims to improve the prediction of advanced liver fibrosis and investigate its associations with factors, ultimately contributing to healthier lifestyle choices and timely management of liver disease.</p>\n </section>\n \n <section>\n \n <h3> Methods</h3>\n \n <p>This cross-sectional study included adults from the US National Health and Nutrition Examination Survey (2017–2020). Questionnaires captured demographic, dietary, exercise, and mental health information. Advanced fibrosis was defined using liver stiffness measurement (LSM) with a 9.5 kPa threshold. XGBoost, a machine learning model, predicted fibrosis, assessed using AUROC. SHAP provided visual explanations of the model's predictions and feature contributions. Model gain, cover, and frequency measured feature importance, enabling transparent, and interpretable analysis.</p>\n </section>\n \n <section>\n \n <h3> Results</h3>\n \n <p>There were 6979 adults (age > 18) that were included in the study with an average age of 49.02 and 3523 (50%) female. The machine learning model had an area under the receiver operator curve of 0.885. The top eight covariates include waist circumference (gain = 0.185), GGT (gain = 0.101), platelet count (gain = 0.059), AST (gain = 0.057), weight (gain = 0.049), HDL-cholesterol (gain = 0.032), and ferritin (gain = 0.034).</p>\n </section>\n \n <section>\n \n <h3> Conclusion</h3>\n \n <p>In conclusion, the utilization of machine learning models proves to be highly effective in accurately predicting the risk of liver fibrosis. By considering various factors such as demographic information, laboratory results, physical examination findings, and lifestyle factors, these models successfully identify crucial risk factors associated with liver fibrosis.</p>\n </section>\n </div>","PeriodicalId":45861,"journal":{"name":"JGH Open","volume":"9 7","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://onlinelibrary.wiley.com/doi/epdf/10.1002/jgh3.70200","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"JGH Open","FirstCategoryId":"1085","ListUrlMain":"https://onlinelibrary.wiley.com/doi/10.1002/jgh3.70200","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"GASTROENTEROLOGY & HEPATOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Introduction
Accurate prediction of liver disease is vital for early intervention, given its potential severity. This study aims to improve the prediction of advanced liver fibrosis and investigate its associations with factors, ultimately contributing to healthier lifestyle choices and timely management of liver disease.
Methods
This cross-sectional study included adults from the US National Health and Nutrition Examination Survey (2017–2020). Questionnaires captured demographic, dietary, exercise, and mental health information. Advanced fibrosis was defined using liver stiffness measurement (LSM) with a 9.5 kPa threshold. XGBoost, a machine learning model, predicted fibrosis, assessed using AUROC. SHAP provided visual explanations of the model's predictions and feature contributions. Model gain, cover, and frequency measured feature importance, enabling transparent, and interpretable analysis.
Results
There were 6979 adults (age > 18) that were included in the study with an average age of 49.02 and 3523 (50%) female. The machine learning model had an area under the receiver operator curve of 0.885. The top eight covariates include waist circumference (gain = 0.185), GGT (gain = 0.101), platelet count (gain = 0.059), AST (gain = 0.057), weight (gain = 0.049), HDL-cholesterol (gain = 0.032), and ferritin (gain = 0.034).
Conclusion
In conclusion, the utilization of machine learning models proves to be highly effective in accurately predicting the risk of liver fibrosis. By considering various factors such as demographic information, laboratory results, physical examination findings, and lifestyle factors, these models successfully identify crucial risk factors associated with liver fibrosis.