A novel imputation method for effective prediction of coronary Kidney disease

S. Arasu, R. Thirumalaiselvi
{"title":"A novel imputation method for effective prediction of coronary Kidney disease","authors":"S. Arasu, R. Thirumalaiselvi","doi":"10.1109/ICCCT2.2017.7972256","DOIUrl":null,"url":null,"abstract":"Kidney disease is become a popular disease in around the world. The prediction of kidney disease is highly complex task while handling huge dataset. The kidney disease dataset contain patients information such as age, blood Pressure levels, albumin, sugar, counts of red blood cells etc., in the dataset there may be some missing values in some features that values may be important to predict kidney disease. Due to such missing values in the dataset will decrease the accuracy of kidney disease prediction. Several methods were proposed to fill up these missing values. An existing classification framework used a data preprocessing method but here the data cleaning process has been made in order to fill the missing values and to correct the erroneous ones. A recalculation process is performed on the chronic Kidney disease (CKD) stages and the values were recalculated and filled in for unknown values. Though this method is efficient, the influence of expert in the field of healthcare dataset values for CKD is needed. So to avoid this need and improve the preprocessing as a layman, Weighted Average Ensemble Learning Imputation (WAELI) is proposed. In this proposed work the single value imputation model used expectation-maximization (EM) and Random Forest (RF) which predict the missing values effectively in small dataset. For huge dataset the multiple value imputation model predict the missing values with the help of RF, Classification And Regression Tree, C4.5 are used to estimate the missing value. Hence the accuracy of kidney disease prediction will be improved by using WAELI. Then introducing priority assigning algorithm to assign priority for each features in the dataset then higher priority features are carried over for classification process. This makes classification process more efficient and time consumption for classification will be reduced.","PeriodicalId":445567,"journal":{"name":"2017 2nd International Conference on Computing and Communications Technologies (ICCCT)","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"25","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International Conference on Computing and Communications Technologies (ICCCT)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCCT2.2017.7972256","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 25

Abstract

Kidney disease is become a popular disease in around the world. The prediction of kidney disease is highly complex task while handling huge dataset. The kidney disease dataset contain patients information such as age, blood Pressure levels, albumin, sugar, counts of red blood cells etc., in the dataset there may be some missing values in some features that values may be important to predict kidney disease. Due to such missing values in the dataset will decrease the accuracy of kidney disease prediction. Several methods were proposed to fill up these missing values. An existing classification framework used a data preprocessing method but here the data cleaning process has been made in order to fill the missing values and to correct the erroneous ones. A recalculation process is performed on the chronic Kidney disease (CKD) stages and the values were recalculated and filled in for unknown values. Though this method is efficient, the influence of expert in the field of healthcare dataset values for CKD is needed. So to avoid this need and improve the preprocessing as a layman, Weighted Average Ensemble Learning Imputation (WAELI) is proposed. In this proposed work the single value imputation model used expectation-maximization (EM) and Random Forest (RF) which predict the missing values effectively in small dataset. For huge dataset the multiple value imputation model predict the missing values with the help of RF, Classification And Regression Tree, C4.5 are used to estimate the missing value. Hence the accuracy of kidney disease prediction will be improved by using WAELI. Then introducing priority assigning algorithm to assign priority for each features in the dataset then higher priority features are carried over for classification process. This makes classification process more efficient and time consumption for classification will be reduced.
一种有效预测冠状动脉肾病的新方法
肾脏疾病在世界范围内已成为一种流行疾病。肾脏疾病的预测是一项非常复杂的任务,需要处理大量数据。肾脏疾病数据集包含患者的信息,如年龄、血压水平、白蛋白、血糖、红细胞计数等,在数据集中可能存在一些缺失值,这些值可能对预测肾脏疾病很重要。由于数据集中的缺失值会降低肾脏疾病预测的准确性。提出了几种方法来填补这些缺失值。现有的分类框架使用数据预处理方法,但这里进行了数据清理过程,以填补缺失值并纠正错误值。对慢性肾脏疾病(CKD)分期进行重新计算,重新计算并填写未知值。虽然该方法是有效的,但需要专家在CKD医疗数据集值领域的影响。因此,为了避免这种需要,并从外行人的角度改进预处理,提出了加权平均集成学习Imputation (WAELI)方法。在本文中,单值输入模型采用了期望最大化(EM)和随机森林(RF),可以有效地预测小数据集的缺失值。对于庞大的数据集,多值输入模型借助RF、分类和回归树、C4.5来预测缺失值。因此,应用WAELI可以提高肾脏疾病预测的准确性。然后引入优先级分配算法,为数据集中的每个特征分配优先级,然后将优先级更高的特征继续进行分类。这使得分类过程更加高效,并且减少了分类的时间消耗。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信