Identify Best Learning Method for Heart Diseases Prediction Under impact of Different Datasets Characteristics

Zahraa chaffat Oleiwi, Ebtesam N. Alshemmary, Salam Al-augby
{"title":"Identify Best Learning Method for Heart Diseases Prediction Under impact of Different Datasets Characteristics","authors":"Zahraa chaffat Oleiwi, Ebtesam N. Alshemmary, Salam Al-augby","doi":"10.31642/jokmc/2018/100104","DOIUrl":null,"url":null,"abstract":"This paper introduces an experimental study of the heart disease datasets characteristics impact on the performance of classification algorithms in the aim of identifying the best algorithm for each dataset under its characteristics. The performance of five machine learning algorithms (logistic regression (LR), K-Nearest Neighbor (KNN), Decision tree (DT), Random Forest (RF), and support vector machine (SVM)), single layer neural network (ANN), and deep neural network (DNN), has been evaluated using five heart disease datasets under four data complexity measurement: number of samples (dataset size), number of features (dimension of dataset), Data sparsity measures, and correlation of features. All datasets have been processed and normalized then the mutual information-based feature selection method was used to solve the overfitting problem. The results show that in general, the machine learning especially the Random Forest algorithm achieves high classification accuracy than deep learning network. In other hand, the high sparsity and less mutual information of dataset has large impact on degradation of the performance of classification algorithms than other characteristics of data.","PeriodicalId":115908,"journal":{"name":"Journal of Kufa for Mathematics and Computer","volume":"136 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Kufa for Mathematics and Computer","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31642/jokmc/2018/100104","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

This paper introduces an experimental study of the heart disease datasets characteristics impact on the performance of classification algorithms in the aim of identifying the best algorithm for each dataset under its characteristics. The performance of five machine learning algorithms (logistic regression (LR), K-Nearest Neighbor (KNN), Decision tree (DT), Random Forest (RF), and support vector machine (SVM)), single layer neural network (ANN), and deep neural network (DNN), has been evaluated using five heart disease datasets under four data complexity measurement: number of samples (dataset size), number of features (dimension of dataset), Data sparsity measures, and correlation of features. All datasets have been processed and normalized then the mutual information-based feature selection method was used to solve the overfitting problem. The results show that in general, the machine learning especially the Random Forest algorithm achieves high classification accuracy than deep learning network. In other hand, the high sparsity and less mutual information of dataset has large impact on degradation of the performance of classification algorithms than other characteristics of data.
在不同数据集特征的影响下,确定心脏病预测的最佳学习方法
本文介绍了心脏病数据集特征对分类算法性能影响的实验研究,旨在为每个数据集在其特征下识别出最佳算法。使用五个心脏病数据集,在四种数据复杂性度量下评估了五种机器学习算法(逻辑回归(LR)、k近邻(KNN)、决策树(DT)、随机森林(RF)和支持向量机(SVM)、单层神经网络(ANN)和深度神经网络(DNN)的性能:样本数量(数据集大小)、特征数量(数据集维度)、数据稀疏度度量和特征相关性。对所有数据集进行处理和归一化,然后采用基于互信息的特征选择方法解决过拟合问题。结果表明,总的来说,机器学习特别是随机森林算法比深度学习网络具有更高的分类准确率。另一方面,与数据的其他特征相比,数据集的高稀疏性和互信息较少对分类算法性能的降低影响较大。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信