K-Fold Cross Validation for Selection of Cardiovascular Disease Diagnosis Features by Applying Rule-Based Datamining

Dwi Normawati, Dewi Pramudi Ismi
{"title":"K-Fold Cross Validation for Selection of Cardiovascular Disease Diagnosis Features by Applying Rule-Based Datamining","authors":"Dwi Normawati, Dewi Pramudi Ismi","doi":"10.31763/simple.v1i2.3","DOIUrl":null,"url":null,"abstract":"Coronary heart disease is a disease that often causes human death, occurs when there is atherosclerosis blocking blood flow to the heart muscle in the coronary arteries. The doctor's referral method for diagnosing coronary heart disease is coronary angiography, but it is invasive, high risk and expensive. The purpose of this study is to analyze the effect of implementing the k-Fold Cross Validation (CV) dataset on the rule-based feature selection to diagnose coronary heart disease, using the Cleveland heart disease dataset. The research conducted a feature selection using a medical expert-based (MFS) and computer-based method, namely the Variable Precision Rough Set (VPRS), which is the development of the Rough Set theory. Evaluation of classification performance using the k-Fold method of 10-Fold, 5-Fold and 3-Fold. The results of the study are the number of attributes of the feature selection results are different in each Fold, both for the VPRS and MFS methods, for accuracy values obtained from the average accuracy resulting from 10-Fold, 5-Fold and 3-Fold. The result was the highest accuracy value in the VPRS method 76.34% with k = 5, while the MTF accuracy was 71.281% with k = 3. So, the k-fold implementation for this case is less effective, because the division of data is still structured, according to the order of records that apply in each fold, while the amount of testing data is too small and too structured. This affects the results of the accuracy because the testing rules are not thoroughly represented","PeriodicalId":115994,"journal":{"name":"Signal and Image Processing Letters","volume":"76 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Signal and Image Processing Letters","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31763/simple.v1i2.3","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15

Abstract

Coronary heart disease is a disease that often causes human death, occurs when there is atherosclerosis blocking blood flow to the heart muscle in the coronary arteries. The doctor's referral method for diagnosing coronary heart disease is coronary angiography, but it is invasive, high risk and expensive. The purpose of this study is to analyze the effect of implementing the k-Fold Cross Validation (CV) dataset on the rule-based feature selection to diagnose coronary heart disease, using the Cleveland heart disease dataset. The research conducted a feature selection using a medical expert-based (MFS) and computer-based method, namely the Variable Precision Rough Set (VPRS), which is the development of the Rough Set theory. Evaluation of classification performance using the k-Fold method of 10-Fold, 5-Fold and 3-Fold. The results of the study are the number of attributes of the feature selection results are different in each Fold, both for the VPRS and MFS methods, for accuracy values obtained from the average accuracy resulting from 10-Fold, 5-Fold and 3-Fold. The result was the highest accuracy value in the VPRS method 76.34% with k = 5, while the MTF accuracy was 71.281% with k = 3. So, the k-fold implementation for this case is less effective, because the division of data is still structured, according to the order of records that apply in each fold, while the amount of testing data is too small and too structured. This affects the results of the accuracy because the testing rules are not thoroughly represented
基于规则的数据挖掘在心血管疾病诊断特征选择中的K-Fold交叉验证
冠心病是一种经常导致人类死亡的疾病,发生在动脉粥样硬化阻塞冠状动脉中流向心脏肌肉的血液时。医生推荐的诊断冠心病的方法是冠状动脉造影,但它是有创的、高风险的、昂贵的。本研究的目的是利用克利夫兰心脏病数据集,分析实施k-Fold交叉验证(CV)数据集对基于规则的特征选择诊断冠心病的影响。本研究采用基于医学专家(MFS)和基于计算机的方法进行特征选择,即变精度粗糙集(VPRS),这是粗糙集理论的发展。使用10-Fold、5-Fold和3-Fold的k-Fold方法评价分类性能。研究结果表明,无论是VPRS方法还是MFS方法,从10-Fold、5-Fold和3-Fold的平均精度得到的精度值,在每个Fold中特征选择结果的属性数量都是不同的。结果表明,k = 5时,VPRS法的准确率最高,为76.34%;k = 3时,MTF法的准确率最高,为71.281%。因此,在这种情况下,k-fold实现的效果较差,因为数据的划分仍然是结构化的,根据每个折叠中应用的记录的顺序,而测试数据的数量太小且过于结构化。这影响了准确性的结果,因为测试规则没有完全表示出来
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信