Multiclass Classification for Large Medical Data using Adaptive Random Forest and Improved Feature Selection Methods

M. Ram, G. Suresh, Narasimha Swamy Biyappu
{"title":"Multiclass Classification for Large Medical Data using Adaptive Random Forest and Improved Feature Selection Methods","authors":"M. Ram, G. Suresh, Narasimha Swamy Biyappu","doi":"10.1109/Confluence52989.2022.9734140","DOIUrl":null,"url":null,"abstract":"A Classification method stands out as a reliable data mining technique applied in medical sciences. We observe multi-classification problem afflicting many recent applications that includes social network analysis, biology, anomaly detection and computer vision. However, such classification techniques usually struggle while dealing with features of data that is generated from multiple classes. Moreover, in case of large medical data, it is observed that the dimension of the data poses the biggest challenge while applying classification technique to them. In order to overcome such problems, we have proposed an adaptive random forest classifier method that uses ensemble feature selection technique for better information gain (IG), improved correlation (IC) and gain ratio (GR). Also, it seeks to solve the class imbalance problem by applying bootstrap resampling for medical data. The result of the proposed method proved that adaptive RF (Random Forest) classifier offers better accuracy, precision and F-score values than standard Random Forest and KNN classification algorithms. The overall performance of algorithms was tested over five real datasets. The result analysis shows performance of the proposed classifier is promising in all real datasets as compared to standard methods.","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Confluence52989.2022.9734140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

A Classification method stands out as a reliable data mining technique applied in medical sciences. We observe multi-classification problem afflicting many recent applications that includes social network analysis, biology, anomaly detection and computer vision. However, such classification techniques usually struggle while dealing with features of data that is generated from multiple classes. Moreover, in case of large medical data, it is observed that the dimension of the data poses the biggest challenge while applying classification technique to them. In order to overcome such problems, we have proposed an adaptive random forest classifier method that uses ensemble feature selection technique for better information gain (IG), improved correlation (IC) and gain ratio (GR). Also, it seeks to solve the class imbalance problem by applying bootstrap resampling for medical data. The result of the proposed method proved that adaptive RF (Random Forest) classifier offers better accuracy, precision and F-score values than standard Random Forest and KNN classification algorithms. The overall performance of algorithms was tested over five real datasets. The result analysis shows performance of the proposed classifier is promising in all real datasets as compared to standard methods.
基于自适应随机森林和改进特征选择方法的大型医疗数据多类分类
分类方法作为一种可靠的数据挖掘技术应用于医学领域。我们观察到多分类问题困扰着许多最近的应用,包括社会网络分析、生物学、异常检测和计算机视觉。然而,这种分类技术在处理由多个类生成的数据特征时通常会遇到困难。此外,对于大型医疗数据,在对其应用分类技术时,数据的维度是最大的挑战。为了克服这些问题,我们提出了一种自适应随机森林分类器方法,该方法使用集成特征选择技术来获得更好的信息增益(IG),改进的相关性(IC)和增益比(GR)。同时,通过对医疗数据进行自举重采样来解决类不平衡问题。结果表明,自适应随机森林分类器比标准随机森林和KNN分类算法具有更高的准确率、精密度和f值。在五个真实数据集上测试了算法的总体性能。结果分析表明,与标准方法相比,该分类器在所有实际数据集上的性能都很有希望。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信