M. Mabrouk, Abubakr Awad, H. Shousha, Wafaa Alake, A. Salama, T. Awad
{"title":"Attribute Reduction and Decision Tree Pruning to Simplify Liver Fibrosis Prediction Algorithms A Cohort Study","authors":"M. Mabrouk, Abubakr Awad, H. Shousha, Wafaa Alake, A. Salama, T. Awad","doi":"10.5121/CSIT.2019.90927","DOIUrl":null,"url":null,"abstract":"Background: Assessment of liver fibrosis is a vital need for enabling therapeutic decisions and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the definitive investigation for assessing the stage of liver fibrosis but it carries several limitations. FIB-4 and APRI also have a limited accuracy. The National Committee for Control of Viral Hepatitis (NCCVH) in Egypt has supplied a valuable pool of electronic patients’ data that data mining techniques can analyze to disclose hidden patterns, trends leading to the evolution of predictive algorithms. Aim: to collaborate with physicians to develop a novel reliable, easy to comprehend noninvasive model to predict the stage of liver fibrosis utilizing routine workup, without imposing extra costs for additional examinations especially in areas with limited resources like Egypt. Methods: This multi-centered retrospective study included baseline demographic, laboratory, and histopathological data of 69106 patients with chronic hepatitis C. We started by data collection preprocessing, cleansing and formatting for knowledge discovery of useful information from Electronic Health Records EHRs. Data mining has been used to build a decision tree (Reduced Error Pruning tree (REP tree)) with 10-fold internal cross-validation. Histopathology results were used to assess accuracy for fibrosis stages. Machine learning feature selection and reduction (CfsSubseteval / best first) reduced the initial number of input features (N=15) to the most relevant ones (N=6) for developing the prediction model. Results: In this study, 32419 patients had F(0-1), 25073 had F(2) and 11615 had F(3-4). FIB-4 and APRI revalidation in our study showed low accuracy and high discordance with biopsy results, with overall AUC 0.68 and 0.58 respectively. Out of 15 attributes machine learning selected Age, AFP, AST, glucose, albumin, and platelet as the most relevant attributes. Results for REP tree indicated an overall classification accuracy up to 70% and ROC Area 0.74 which was not nearly affected by attribute reduction, and pruning . However attribute reduction, and tree pruning were associated with simpler model easy to understand by physician with less time for execution. Conclusion: This study we had the chance to study a large cohort of 69106 chronic hepatitis patients with available liver biopsy results to revise and validate the accuracy of FIB-4 and APRI. This study represents the collaboration between computer scientist and hepatologists to provide clinicians with an accurate novel and reliable, noninvasive model to predict the stage of liver fibrosis.","PeriodicalId":248929,"journal":{"name":"9th International Conference on Computer Science, Engineering and Applications (CCSEA 2019)","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"9th International Conference on Computer Science, Engineering and Applications (CCSEA 2019)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5121/CSIT.2019.90927","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
Background: Assessment of liver fibrosis is a vital need for enabling therapeutic decisions and prognostic evaluations of chronic hepatitis. Liver biopsy is considered the definitive investigation for assessing the stage of liver fibrosis but it carries several limitations. FIB-4 and APRI also have a limited accuracy. The National Committee for Control of Viral Hepatitis (NCCVH) in Egypt has supplied a valuable pool of electronic patients’ data that data mining techniques can analyze to disclose hidden patterns, trends leading to the evolution of predictive algorithms. Aim: to collaborate with physicians to develop a novel reliable, easy to comprehend noninvasive model to predict the stage of liver fibrosis utilizing routine workup, without imposing extra costs for additional examinations especially in areas with limited resources like Egypt. Methods: This multi-centered retrospective study included baseline demographic, laboratory, and histopathological data of 69106 patients with chronic hepatitis C. We started by data collection preprocessing, cleansing and formatting for knowledge discovery of useful information from Electronic Health Records EHRs. Data mining has been used to build a decision tree (Reduced Error Pruning tree (REP tree)) with 10-fold internal cross-validation. Histopathology results were used to assess accuracy for fibrosis stages. Machine learning feature selection and reduction (CfsSubseteval / best first) reduced the initial number of input features (N=15) to the most relevant ones (N=6) for developing the prediction model. Results: In this study, 32419 patients had F(0-1), 25073 had F(2) and 11615 had F(3-4). FIB-4 and APRI revalidation in our study showed low accuracy and high discordance with biopsy results, with overall AUC 0.68 and 0.58 respectively. Out of 15 attributes machine learning selected Age, AFP, AST, glucose, albumin, and platelet as the most relevant attributes. Results for REP tree indicated an overall classification accuracy up to 70% and ROC Area 0.74 which was not nearly affected by attribute reduction, and pruning . However attribute reduction, and tree pruning were associated with simpler model easy to understand by physician with less time for execution. Conclusion: This study we had the chance to study a large cohort of 69106 chronic hepatitis patients with available liver biopsy results to revise and validate the accuracy of FIB-4 and APRI. This study represents the collaboration between computer scientist and hepatologists to provide clinicians with an accurate novel and reliable, noninvasive model to predict the stage of liver fibrosis.
背景:肝纤维化的评估对慢性肝炎的治疗决策和预后评估至关重要。肝活检被认为是评估肝纤维化分期的决定性调查,但它有一些局限性。FIB-4和APRI也有一定的准确性。埃及病毒性肝炎控制国家委员会(NCCVH)提供了一个有价值的电子患者数据库,数据挖掘技术可以对其进行分析,以揭示隐藏的模式和趋势,从而导致预测算法的发展。目的:与医生合作开发一种新颖可靠、易于理解的无创模型,利用常规检查来预测肝纤维化的阶段,而不会增加额外检查的额外费用,特别是在资源有限的地区,如埃及。方法:本多中心回顾性研究包括69106例慢性丙型肝炎患者的基线人口统计学、实验室和组织病理学数据。我们从数据收集、预处理、清理和格式化开始,以便从电子健康记录EHRs中发现有用的信息。数据挖掘被用于构建具有10倍内部交叉验证的决策树(减少错误修剪树(REP树))。组织病理学结果用于评估纤维化分期的准确性。机器学习特征选择与约简(CfsSubseteval / best first)将初始输入特征(N=15)减少到最相关的特征(N=6),用于开发预测模型。结果:本研究中F(0-1) 32419例,F(2) 25073例,F(3-4) 11615例。本研究中FIB-4和APRI再验证的准确性较低,与活检结果高度不一致,总AUC分别为0.68和0.58。在15个属性中,机器学习选择年龄、AFP、AST、葡萄糖、白蛋白和血小板作为最相关的属性。结果表明,REP树总体分类精度达70%,ROC Area 0.74,属性约简和剪枝对分类精度影响不大。而属性约简和树修剪的模型更简单,易于医生理解,执行时间更短。结论:在这项研究中,我们有机会对69106名慢性肝炎患者进行大队列研究,这些患者有可用的肝活检结果,以修正和验证FIB-4和APRI的准确性。这项研究代表了计算机科学家和肝病学家之间的合作,为临床医生提供了一个准确、新颖、可靠、无创的模型来预测肝纤维化的分期。