Multiclass Classification for Large Medical Data using Adaptive Random Forest and Improved Feature Selection Methods

2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence) Pub Date : 2022-01-27 DOI:10.1109/Confluence52989.2022.9734140

M. Ram, G. Suresh, Narasimha Swamy Biyappu

{"title":"Multiclass Classification for Large Medical Data using Adaptive Random Forest and Improved Feature Selection Methods","authors":"M. Ram, G. Suresh, Narasimha Swamy Biyappu","doi":"10.1109/Confluence52989.2022.9734140","DOIUrl":null,"url":null,"abstract":"A Classification method stands out as a reliable data mining technique applied in medical sciences. We observe multi-classification problem afflicting many recent applications that includes social network analysis, biology, anomaly detection and computer vision. However, such classification techniques usually struggle while dealing with features of data that is generated from multiple classes. Moreover, in case of large medical data, it is observed that the dimension of the data poses the biggest challenge while applying classification technique to them. In order to overcome such problems, we have proposed an adaptive random forest classifier method that uses ensemble feature selection technique for better information gain (IG), improved correlation (IC) and gain ratio (GR). Also, it seeks to solve the class imbalance problem by applying bootstrap resampling for medical data. The result of the proposed method proved that adaptive RF (Random Forest) classifier offers better accuracy, precision and F-score values than standard Random Forest and KNN classification algorithms. The overall performance of algorithms was tested over five real datasets. The result analysis shows performance of the proposed classifier is promising in all real datasets as compared to standard methods.","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Confluence52989.2022.9734140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

A Classification method stands out as a reliable data mining technique applied in medical sciences. We observe multi-classification problem afflicting many recent applications that includes social network analysis, biology, anomaly detection and computer vision. However, such classification techniques usually struggle while dealing with features of data that is generated from multiple classes. Moreover, in case of large medical data, it is observed that the dimension of the data poses the biggest challenge while applying classification technique to them. In order to overcome such problems, we have proposed an adaptive random forest classifier method that uses ensemble feature selection technique for better information gain (IG), improved correlation (IC) and gain ratio (GR). Also, it seeks to solve the class imbalance problem by applying bootstrap resampling for medical data. The result of the proposed method proved that adaptive RF (Random Forest) classifier offers better accuracy, precision and F-score values than standard Random Forest and KNN classification algorithms. The overall performance of algorithms was tested over five real datasets. The result analysis shows performance of the proposed classifier is promising in all real datasets as compared to standard methods.

查看原文本刊更多论文

基于自适应随机森林和改进特征选择方法的大型医疗数据多类分类

分类方法作为一种可靠的数据挖掘技术应用于医学领域。我们观察到多分类问题困扰着许多最近的应用，包括社会网络分析、生物学、异常检测和计算机视觉。然而，这种分类技术在处理由多个类生成的数据特征时通常会遇到困难。此外，对于大型医疗数据，在对其应用分类技术时，数据的维度是最大的挑战。为了克服这些问题，我们提出了一种自适应随机森林分类器方法，该方法使用集成特征选择技术来获得更好的信息增益(IG)，改进的相关性(IC)和增益比(GR)。同时，通过对医疗数据进行自举重采样来解决类不平衡问题。结果表明，自适应随机森林分类器比标准随机森林和KNN分类算法具有更高的准确率、精密度和f值。在五个真实数据集上测试了算法的总体性能。结果分析表明，与标准方法相比，该分类器在所有实际数据集上的性能都很有希望。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)

自引率

0.00%

发文量