用基于树的集成学习预测胎儿健康风险的心脏造影数据分析

International Journal of Information Technology and Computer Science Pub Date : 2021-10-08 DOI:10.5815/ijitcs.2021.05.03

Pankaj Bhowmik, Pulak Chandra Bhowmik, U. Ali, Md. Sohrawordi

{"title":"用基于树的集成学习预测胎儿健康风险的心脏造影数据分析","authors":"Pankaj Bhowmik, Pulak Chandra Bhowmik, U. Ali, Md. Sohrawordi","doi":"10.5815/ijitcs.2021.05.03","DOIUrl":null,"url":null,"abstract":"A sizeable number of women face difficulties during pregnancy, which eventually can lead the fetus towards serious health problems. However, early detection of these risks can save both the invaluable life of infants and mothers. Cardiotocography (CTG) data provides sophisticated information by monitoring the heart rate signal of the fetus, is used to predict the potential risks of fetal wellbeing and for making clinical conclusions. This paper proposed to analyze the antepartum CTG data (available on UCI Machine Learning Repository) and develop an efficient tree-based ensemble learning (EL) classifier model to predict fetal health status. In this study, EL considers the Stacking approach, and a concise overview of this approach is discussed and developed accordingly. The study also endeavors to apply distinct machine learning algorithmic techniques on the CTG dataset and determine their performances. The Stacking EL technique, in this paper, involves four tree-based machine learning algorithms, namely, Random Forest classifier, Decision Tree classifier, Extra Trees classifier, and Deep Forest classifier as base learners. The CTG dataset contains 21 features, but only 10 most important features are selected from the dataset with the Chi-square method for this experiment, and then the features are normalized with Min-Max scaling. Following that, Grid Search is applied for tuning the hyperparameters of the base algorithms. Subsequently, 10-folds cross validation is performed to select the meta learner of the EL classifier model. However, a comparative model assessment is made between the individual base learning algorithms and the EL classifier model; and the finding depicts EL classifiers’ superiority in fetal health risks prediction with securing the accuracy of about 96.05%. Eventually, this study concludes that the Stacking EL approach can be a substantial paradigm in machine learning studies to improve models’ accuracy and reduce the error rate.","PeriodicalId":130361,"journal":{"name":"International Journal of Information Technology and Computer Science","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Cardiotocography Data Analysis to Predict Fetal Health Risks with Tree-Based Ensemble Learning\",\"authors\":\"Pankaj Bhowmik, Pulak Chandra Bhowmik, U. Ali, Md. Sohrawordi\",\"doi\":\"10.5815/ijitcs.2021.05.03\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A sizeable number of women face difficulties during pregnancy, which eventually can lead the fetus towards serious health problems. However, early detection of these risks can save both the invaluable life of infants and mothers. Cardiotocography (CTG) data provides sophisticated information by monitoring the heart rate signal of the fetus, is used to predict the potential risks of fetal wellbeing and for making clinical conclusions. This paper proposed to analyze the antepartum CTG data (available on UCI Machine Learning Repository) and develop an efficient tree-based ensemble learning (EL) classifier model to predict fetal health status. In this study, EL considers the Stacking approach, and a concise overview of this approach is discussed and developed accordingly. The study also endeavors to apply distinct machine learning algorithmic techniques on the CTG dataset and determine their performances. The Stacking EL technique, in this paper, involves four tree-based machine learning algorithms, namely, Random Forest classifier, Decision Tree classifier, Extra Trees classifier, and Deep Forest classifier as base learners. The CTG dataset contains 21 features, but only 10 most important features are selected from the dataset with the Chi-square method for this experiment, and then the features are normalized with Min-Max scaling. Following that, Grid Search is applied for tuning the hyperparameters of the base algorithms. Subsequently, 10-folds cross validation is performed to select the meta learner of the EL classifier model. However, a comparative model assessment is made between the individual base learning algorithms and the EL classifier model; and the finding depicts EL classifiers’ superiority in fetal health risks prediction with securing the accuracy of about 96.05%. Eventually, this study concludes that the Stacking EL approach can be a substantial paradigm in machine learning studies to improve models’ accuracy and reduce the error rate.\",\"PeriodicalId\":130361,\"journal\":{\"name\":\"International Journal of Information Technology and Computer Science\",\"volume\":\"136 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Journal of Information Technology and Computer Science\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5815/ijitcs.2021.05.03\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Information Technology and Computer Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5815/ijitcs.2021.05.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

相当多的妇女在怀孕期间面临困难，最终可能导致胎儿出现严重的健康问题。然而，及早发现这些风险可以挽救婴儿和母亲的宝贵生命。心脏造影(CTG)数据通过监测胎儿的心率信号提供了复杂的信息，用于预测胎儿健康的潜在风险并做出临床结论。本文提出对产前CTG数据(UCI Machine Learning Repository提供)进行分析，开发一种高效的基于树的集成学习(EL)分类器模型来预测胎儿健康状况。在本研究中，EL考虑了堆叠方法，并相应地讨论和发展了该方法的简明概述。本研究还尝试在CTG数据集上应用不同的机器学习算法技术，并确定它们的性能。本文的Stacking EL技术涉及四种基于树的机器学习算法，即Random Forest分类器、Decision Tree分类器、Extra Trees分类器和Deep Forest分类器作为基础学习器。CTG数据集包含21个特征，但本实验使用卡方方法从数据集中选择了10个最重要的特征，然后使用Min-Max缩放对特征进行归一化。然后，应用网格搜索对基本算法的超参数进行调优。随后，进行10次交叉验证以选择EL分类器模型的元学习器。然而，在个体基学习算法和EL分类器模型之间进行了比较模型评估;EL分类器在胎儿健康风险预测方面具有优势，准确率约为96.05%。最终，本研究得出结论，堆叠EL方法可以成为机器学习研究中的一个重要范例，以提高模型的准确性并降低错误率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Cardiotocography Data Analysis to Predict Fetal Health Risks with Tree-Based Ensemble Learning

A sizeable number of women face difficulties during pregnancy, which eventually can lead the fetus towards serious health problems. However, early detection of these risks can save both the invaluable life of infants and mothers. Cardiotocography (CTG) data provides sophisticated information by monitoring the heart rate signal of the fetus, is used to predict the potential risks of fetal wellbeing and for making clinical conclusions. This paper proposed to analyze the antepartum CTG data (available on UCI Machine Learning Repository) and develop an efficient tree-based ensemble learning (EL) classifier model to predict fetal health status. In this study, EL considers the Stacking approach, and a concise overview of this approach is discussed and developed accordingly. The study also endeavors to apply distinct machine learning algorithmic techniques on the CTG dataset and determine their performances. The Stacking EL technique, in this paper, involves four tree-based machine learning algorithms, namely, Random Forest classifier, Decision Tree classifier, Extra Trees classifier, and Deep Forest classifier as base learners. The CTG dataset contains 21 features, but only 10 most important features are selected from the dataset with the Chi-square method for this experiment, and then the features are normalized with Min-Max scaling. Following that, Grid Search is applied for tuning the hyperparameters of the base algorithms. Subsequently, 10-folds cross validation is performed to select the meta learner of the EL classifier model. However, a comparative model assessment is made between the individual base learning algorithms and the EL classifier model; and the finding depicts EL classifiers’ superiority in fetal health risks prediction with securing the accuracy of about 96.05%. Eventually, this study concludes that the Stacking EL approach can be a substantial paradigm in machine learning studies to improve models’ accuracy and reduce the error rate.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

International Journal of Information Technology and Computer Science

自引率

0.00%

发文量