Using Ensemble Technique to Improve Multiclass Classification

Dalton Ndirangu, W. Mwangi, L. Nderu
{"title":"Using Ensemble Technique to Improve Multiclass Classification","authors":"Dalton Ndirangu, W. Mwangi, L. Nderu","doi":"10.7176/jiea/9-5-04","DOIUrl":null,"url":null,"abstract":"Many real world applications inevitably contain datasets that have multiclass structure characterized by imbalance classes, redundant and irrelevant features that degrade performance of classifiers. Minority classes in the datasets are treated as outliers’ classes. The research aimed at establishing the role of ensemble technique in improving performance of multiclass classification. Multiclass datasets were transformed to binary and the datasets resampled using Synthetic minority oversampling technique (SMOTE) algorithm.  Relevant features of the datasets were selected by use of an ensemble filter method developed using Correlation, Information Gain, Gain-Ratio and ReliefF filter selection methods. Adaboost and Random subspace learning algorithms were combined using Voting methodology utilizing random forest as the base classifier. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naive bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established that ensemble technique, resampling datasets and decomposing multiclass results in an improved classification performance as well as enhanced detection of minority outlier (rare) classes. Keywords: Multiclass, Classification, Outliers, Ensemble, Learning Algorithm DOI : 10.7176/JIEA/9-5-04 Publication date : August 31 st 2019","PeriodicalId":440930,"journal":{"name":"Journal of Information Engineering and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7176/jiea/9-5-04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Many real world applications inevitably contain datasets that have multiclass structure characterized by imbalance classes, redundant and irrelevant features that degrade performance of classifiers. Minority classes in the datasets are treated as outliers’ classes. The research aimed at establishing the role of ensemble technique in improving performance of multiclass classification. Multiclass datasets were transformed to binary and the datasets resampled using Synthetic minority oversampling technique (SMOTE) algorithm.  Relevant features of the datasets were selected by use of an ensemble filter method developed using Correlation, Information Gain, Gain-Ratio and ReliefF filter selection methods. Adaboost and Random subspace learning algorithms were combined using Voting methodology utilizing random forest as the base classifier. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naive bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established that ensemble technique, resampling datasets and decomposing multiclass results in an improved classification performance as well as enhanced detection of minority outlier (rare) classes. Keywords: Multiclass, Classification, Outliers, Ensemble, Learning Algorithm DOI : 10.7176/JIEA/9-5-04 Publication date : August 31 st 2019
集成技术在多类分类中的应用
许多现实世界的应用程序不可避免地包含具有多类结构的数据集,其特征是不平衡的类、冗余和不相关的特征,这些特征会降低分类器的性能。数据集中的少数类被视为离群值类。本研究旨在建立集成技术在提高多类分类性能中的作用。将多类数据集转换为二值,采用合成少数派过采样技术(SMOTE)对数据集进行重采样。利用Correlation、Information Gain、Gain- ratio和ReliefF滤波器选择方法开发的集成滤波方法选择数据集的相关特征。采用投票方法结合Adaboost和Random子空间学习算法,以随机森林为基础分类器。分类器采用10倍分层交叉验证进行评估。该模型在多类问题的异常点检测和分类预测方面表现出较好的性能。该模型优于朴素贝叶斯(Naive bayes)、KNN、Bagging、JRipper、决策树(Decision trees)、随机树(RandomTree)和随机森林(Random forest)等其他知名的分类和离群点检测算法。研究结果表明,集成技术、重采样数据集和多类分解可以提高分类性能,并增强对少数异常(罕见)类的检测。关键词:多类,分类,异常值,集成,学习算法DOI: 10.7176/JIEA/9-5-04出版日期:2019年8月31日
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信