Selecting Dominant Features for the Prediction of Early-Stage Chronic Kidney Disease

IF 2 4区计算机科学 Q2 Computer Science

Intelligent Automation and Soft Computing Pub Date : 2022-01-01 DOI:10.32604/iasc.2022.018654

Vinothini Arumugam, S. Baghavathi Priya

{"title":"Selecting Dominant Features for the Prediction of Early-Stage Chronic Kidney Disease","authors":"Vinothini Arumugam, S. Baghavathi Priya","doi":"10.32604/iasc.2022.018654","DOIUrl":null,"url":null,"abstract":"Nowadays, Chronic Kidney Disease (CKD) is one of the vigorous public health diseases. Hence, early detection of the disease may reduce the severity of its consequences. Besides, medical databases of any disease diagnosis may be collected from the blood test, urine test, and patient history. Nevertheless, medical information retrieved from various sources is diverse. Therefore, it is unadaptable to evaluate numerical and nominal features using the same feature selection algorithm, which may lead to fallacious analysis. Applying machine learning techniques over the medical database is a common way to help feature identification for CKD prediction. In this paper, a novel Mixed Data Feature Selection (MDFS) model is proposed to select and filter preeminent features from the medical dataset for earlier CKD prediction, where CKD clinical data with 12 numerical and 12 nominal features are fed to the MDFS model. For each feature in the mixed dataset, the model applies feature selection methods according to the data type of the feature. Point Biserial correlation and a Chi-square filter are applied to filter the numerical features and nominal features, respectively. Meanwhile, an SVM algorithm is employed to evaluate and select the best feature subset. In our experimental results, the proposed MDFS model performs superior to existing works in terms of accuracy and the number of reduced features. The identified feature subset is also demonstrated to preserve its original properties without discretization during feature selection.","PeriodicalId":50357,"journal":{"name":"Intelligent Automation and Soft Computing","volume":"5 1","pages":""},"PeriodicalIF":2.0000,"publicationDate":"2022-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Automation and Soft Computing","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.32604/iasc.2022.018654","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Computer Science","Score":null,"Total":0}

引用次数: 1

Abstract

Nowadays, Chronic Kidney Disease (CKD) is one of the vigorous public health diseases. Hence, early detection of the disease may reduce the severity of its consequences. Besides, medical databases of any disease diagnosis may be collected from the blood test, urine test, and patient history. Nevertheless, medical information retrieved from various sources is diverse. Therefore, it is unadaptable to evaluate numerical and nominal features using the same feature selection algorithm, which may lead to fallacious analysis. Applying machine learning techniques over the medical database is a common way to help feature identification for CKD prediction. In this paper, a novel Mixed Data Feature Selection (MDFS) model is proposed to select and filter preeminent features from the medical dataset for earlier CKD prediction, where CKD clinical data with 12 numerical and 12 nominal features are fed to the MDFS model. For each feature in the mixed dataset, the model applies feature selection methods according to the data type of the feature. Point Biserial correlation and a Chi-square filter are applied to filter the numerical features and nominal features, respectively. Meanwhile, an SVM algorithm is employed to evaluate and select the best feature subset. In our experimental results, the proposed MDFS model performs superior to existing works in terms of accuracy and the number of reduced features. The identified feature subset is also demonstrated to preserve its original properties without discretization during feature selection.

查看原文本刊更多论文

选择优势特征预测早期慢性肾脏疾病

慢性肾脏疾病(CKD)是当今流行的公共卫生疾病之一。因此，及早发现该病可减轻其后果的严重程度。此外，任何疾病诊断的医学数据库都可以从血液检查、尿液检查和患者病史中收集。然而，从各种来源检索到的医疗信息各不相同。因此，使用相同的特征选择算法来评估数值特征和标称特征是不适应的，这可能导致错误的分析。在医学数据库上应用机器学习技术是帮助识别CKD预测特征的常用方法。本文提出了一种新的混合数据特征选择(MDFS)模型，用于从医疗数据集中选择和过滤早期CKD预测的卓越特征，其中具有12个数值特征和12个标称特征的CKD临床数据被输入MDFS模型。对于混合数据集中的每个特征，模型根据特征的数据类型应用特征选择方法。点双列相关和卡方滤波器分别用于滤波数值特征和标称特征。同时，采用支持向量机算法对最优特征子集进行评估和选择。在我们的实验结果中，所提出的MDFS模型在准确性和约简特征数量方面优于现有的工作。所识别的特征子集在特征选择过程中也被证明可以保持其原始属性而不被离散化。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Automation and Soft Computing 工程技术-计算机：人工智能

CiteScore

3.50

自引率

10.00%

发文量

429

审稿时长

10.8 months

期刊介绍： An International Journal seeks to provide a common forum for the dissemination of accurate results about the world of intelligent automation, artificial intelligence, computer science, control, intelligent data science, modeling and systems engineering. It is intended that the articles published in the journal will encompass both the short and the long term effects of soft computing and other related fields such as robotics, control, computer, vision, speech recognition, pattern recognition, data mining, big data, data analytics, machine intelligence, cyber security and deep learning. It further hopes it will address the existing and emerging relationships between automation, systems engineering, system of systems engineering and soft computing. The journal will publish original and survey papers on artificial intelligence, intelligent automation and computer engineering with an emphasis on current and potential applications of soft computing. It will have a broad interest in all engineering disciplines, computer science, and related technological fields such as medicine, biology operations research, technology management, agriculture and information technology.