预测肝脏疾病的集成集成学习框架

IF 1.4 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

International Journal of Online and Biomedical Engineering Pub Date : 2023-09-18 DOI:10.3991/ijoe.v19i13.41871

Soufiane Ardchir, Youssef Ouassit, Soumaya Ounacer, Mohammed Yassine EL Ghoumari, Mohamed Azzouazi

{"title":"预测肝脏疾病的集成集成学习框架","authors":"Soufiane Ardchir, Youssef Ouassit, Soumaya Ounacer, Mohammed Yassine EL Ghoumari, Mohamed Azzouazi","doi":"10.3991/ijoe.v19i13.41871","DOIUrl":null,"url":null,"abstract":"The liver disease has become a pressing global issue, with a sharp increase in cases reported worldwide. Detecting liver disease can be difficult as it often has few noticeable symptoms, which means that by the time it is detected, it may have already progressed to an advanced stage, resulting in many people dying without even realizing they had it. Early detection is crucial as it enables patients to begin treatment earlier, which can potentially save their lives. This study aimed to assess the efficacy of five ensemble machine learning (ML) models, namely RF, XGBoost, Extra Trees, bagging, and stacking methods, in predicting liver disease. It uses the ILPD dataset. To prevent overfitting and biases in the dataset, several pre-processing statistical techniques were employed to handle missing data, outliers, and data balancing. The study’s results underline the importance of using the RFE feature selection method, which allowed the use of only the most relevant features for the model, which may have improved the accuracy and efficiency of the model. The study found that the highest testing accuracy of 93% was achieved by the proposed model, which utilized an improved preprocessing approach and a stacking ensemble classifier with RFE feature selection. The use of ensemble ML has given promising results. Indeed, medical professionals can develop models better equipped to handle the complexity and variability of medical data, resulting in more accurate diagnoses, more effective treatment plans, and better patient outcomes.","PeriodicalId":36900,"journal":{"name":"International Journal of Online and Biomedical Engineering","volume":"118 1","pages":"0"},"PeriodicalIF":1.4000,"publicationDate":"2023-09-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Integrated Ensemble Learning Framework for Predicting Liver Disease\",\"authors\":\"Soufiane Ardchir, Youssef Ouassit, Soumaya Ounacer, Mohammed Yassine EL Ghoumari, Mohamed Azzouazi\",\"doi\":\"10.3991/ijoe.v19i13.41871\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The liver disease has become a pressing global issue, with a sharp increase in cases reported worldwide. Detecting liver disease can be difficult as it often has few noticeable symptoms, which means that by the time it is detected, it may have already progressed to an advanced stage, resulting in many people dying without even realizing they had it. Early detection is crucial as it enables patients to begin treatment earlier, which can potentially save their lives. This study aimed to assess the efficacy of five ensemble machine learning (ML) models, namely RF, XGBoost, Extra Trees, bagging, and stacking methods, in predicting liver disease. It uses the ILPD dataset. To prevent overfitting and biases in the dataset, several pre-processing statistical techniques were employed to handle missing data, outliers, and data balancing. The study’s results underline the importance of using the RFE feature selection method, which allowed the use of only the most relevant features for the model, which may have improved the accuracy and efficiency of the model. The study found that the highest testing accuracy of 93% was achieved by the proposed model, which utilized an improved preprocessing approach and a stacking ensemble classifier with RFE feature selection. The use of ensemble ML has given promising results. Indeed, medical professionals can develop models better equipped to handle the complexity and variability of medical data, resulting in more accurate diagnoses, more effective treatment plans, and better patient outcomes.\",\"PeriodicalId\":36900,\"journal\":{\"name\":\"International Journal of Online and Biomedical Engineering\",\"volume\":\"118 1\",\"pages\":\"0\"},\"PeriodicalIF\":1.4000,\"publicationDate\":\"2023-09-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Online and Biomedical Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.3991/ijoe.v19i13.41871\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Online and Biomedical Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.3991/ijoe.v19i13.41871","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

肝病已成为一个紧迫的全球性问题，世界范围内报告的病例急剧增加。发现肝病可能很困难，因为它通常没有明显的症状，这意味着当它被发现时，它可能已经发展到晚期，导致许多人甚至在没有意识到自己患有肝病的情况下死亡。早期发现至关重要，因为它使患者能够更早开始治疗，这可能挽救他们的生命。本研究旨在评估五种集成机器学习(ML)模型，即RF、XGBoost、Extra Trees、bagging和stacking方法在预测肝脏疾病方面的功效。它使用ILPD数据集。为了防止数据集的过拟合和偏差，采用了几种预处理统计技术来处理缺失数据、异常值和数据平衡。研究结果强调了使用RFE特征选择方法的重要性，该方法允许只使用最相关的模型特征，这可能提高了模型的准确性和效率。研究发现，该模型采用改进的预处理方法和具有RFE特征选择的叠加集成分类器，测试准确率最高，达到93%。集成机器学习的使用已经取得了可喜的结果。实际上，医疗专业人员可以开发更好的模型来处理医疗数据的复杂性和可变性，从而产生更准确的诊断、更有效的治疗计划和更好的患者结果。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Integrated Ensemble Learning Framework for Predicting Liver Disease

The liver disease has become a pressing global issue, with a sharp increase in cases reported worldwide. Detecting liver disease can be difficult as it often has few noticeable symptoms, which means that by the time it is detected, it may have already progressed to an advanced stage, resulting in many people dying without even realizing they had it. Early detection is crucial as it enables patients to begin treatment earlier, which can potentially save their lives. This study aimed to assess the efficacy of five ensemble machine learning (ML) models, namely RF, XGBoost, Extra Trees, bagging, and stacking methods, in predicting liver disease. It uses the ILPD dataset. To prevent overfitting and biases in the dataset, several pre-processing statistical techniques were employed to handle missing data, outliers, and data balancing. The study’s results underline the importance of using the RFE feature selection method, which allowed the use of only the most relevant features for the model, which may have improved the accuracy and efficiency of the model. The study found that the highest testing accuracy of 93% was achieved by the proposed model, which utilized an improved preprocessing approach and a stacking ensemble classifier with RFE feature selection. The use of ensemble ML has given promising results. Indeed, medical professionals can develop models better equipped to handle the complexity and variability of medical data, resulting in more accurate diagnoses, more effective treatment plans, and better patient outcomes.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊