A Improving Feature Selection on Heart Disease Dataset With Boruta Approach

Muhammad Arzanul Manhar, I. Soesanti, N. A. Setiawan
{"title":"A Improving Feature Selection on Heart Disease Dataset With Boruta Approach","authors":"Muhammad Arzanul Manhar, I. Soesanti, N. A. Setiawan","doi":"10.46962/FORTEIJEERI.V1I1.6","DOIUrl":null,"url":null,"abstract":"Coronary artery disease (CAD) is one of the deadliest diseases in the entire world, including in Indonesia. CAD occurs due to narrowing or blockage of coronary arteries which is usually caused by atherosclerosis. Various studies have been conducted with the aim to predict the nature and characteristics of this disease. Some researches uses the Z-Alizadeh Sani dataset which consists of 54 attributes with two results of classification, CAD and Normal to classify its data. Feature selection is one way to reduce the number of attributes that exist by leaving the attributes that have a high effect on the dataset. In this study, the Boruta method is used as a feature selection to minimize the attributes and leave the attributes with high relative with the dataset. By reducing the attributes in the dataset through the feature selection process, sets of 17 and 18 attributes are selected as attributes with high relative with the dataset. These attributes then used to calculate the accuracy value of the dataset using the several classification methods and 90,3% accuracy is obtained from this study.","PeriodicalId":175469,"journal":{"name":"Journal FORTEI-JEERI","volume":"26 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-05-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal FORTEI-JEERI","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.46962/FORTEIJEERI.V1I1.6","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

Abstract

Coronary artery disease (CAD) is one of the deadliest diseases in the entire world, including in Indonesia. CAD occurs due to narrowing or blockage of coronary arteries which is usually caused by atherosclerosis. Various studies have been conducted with the aim to predict the nature and characteristics of this disease. Some researches uses the Z-Alizadeh Sani dataset which consists of 54 attributes with two results of classification, CAD and Normal to classify its data. Feature selection is one way to reduce the number of attributes that exist by leaving the attributes that have a high effect on the dataset. In this study, the Boruta method is used as a feature selection to minimize the attributes and leave the attributes with high relative with the dataset. By reducing the attributes in the dataset through the feature selection process, sets of 17 and 18 attributes are selected as attributes with high relative with the dataset. These attributes then used to calculate the accuracy value of the dataset using the several classification methods and 90,3% accuracy is obtained from this study.
基于Boruta方法的心脏病数据集特征选择改进
冠状动脉疾病(CAD)是全世界最致命的疾病之一,包括在印度尼西亚。冠心病的发生是由于冠状动脉狭窄或阻塞,通常由动脉粥样硬化引起。已经进行了各种研究,目的是预测这种疾病的性质和特征。一些研究使用Z-Alizadeh Sani数据集对其数据进行分类,该数据集由54个属性组成,具有CAD和Normal两种分类结果。特征选择是通过保留对数据集有很大影响的属性来减少存在的属性数量的一种方法。在本研究中,使用Boruta方法作为特征选择,将属性最小化,留下与数据集相关度高的属性。通过特征选择过程对数据集中的属性进行约简,选择17和18个属性集作为与数据集相关度较高的属性。然后将这些属性用于使用几种分类方法计算数据集的准确率值,从本研究中获得了93.3%的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信