{"title":"Multiclass Classification for Large Medical Data using Adaptive Random Forest and Improved Feature Selection Methods","authors":"M. Ram, G. Suresh, Narasimha Swamy Biyappu","doi":"10.1109/Confluence52989.2022.9734140","DOIUrl":null,"url":null,"abstract":"A Classification method stands out as a reliable data mining technique applied in medical sciences. We observe multi-classification problem afflicting many recent applications that includes social network analysis, biology, anomaly detection and computer vision. However, such classification techniques usually struggle while dealing with features of data that is generated from multiple classes. Moreover, in case of large medical data, it is observed that the dimension of the data poses the biggest challenge while applying classification technique to them. In order to overcome such problems, we have proposed an adaptive random forest classifier method that uses ensemble feature selection technique for better information gain (IG), improved correlation (IC) and gain ratio (GR). Also, it seeks to solve the class imbalance problem by applying bootstrap resampling for medical data. The result of the proposed method proved that adaptive RF (Random Forest) classifier offers better accuracy, precision and F-score values than standard Random Forest and KNN classification algorithms. The overall performance of algorithms was tested over five real datasets. The result analysis shows performance of the proposed classifier is promising in all real datasets as compared to standard methods.","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"44 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/Confluence52989.2022.9734140","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
A Classification method stands out as a reliable data mining technique applied in medical sciences. We observe multi-classification problem afflicting many recent applications that includes social network analysis, biology, anomaly detection and computer vision. However, such classification techniques usually struggle while dealing with features of data that is generated from multiple classes. Moreover, in case of large medical data, it is observed that the dimension of the data poses the biggest challenge while applying classification technique to them. In order to overcome such problems, we have proposed an adaptive random forest classifier method that uses ensemble feature selection technique for better information gain (IG), improved correlation (IC) and gain ratio (GR). Also, it seeks to solve the class imbalance problem by applying bootstrap resampling for medical data. The result of the proposed method proved that adaptive RF (Random Forest) classifier offers better accuracy, precision and F-score values than standard Random Forest and KNN classification algorithms. The overall performance of algorithms was tested over five real datasets. The result analysis shows performance of the proposed classifier is promising in all real datasets as compared to standard methods.