Using Ensemble Technique to Improve Multiclass Classification

Journal of Information Engineering and Applications Pub Date : 2019-08-01 DOI:10.7176/jiea/9-5-04

Dalton Ndirangu, W. Mwangi, L. Nderu

{"title":"Using Ensemble Technique to Improve Multiclass Classification","authors":"Dalton Ndirangu, W. Mwangi, L. Nderu","doi":"10.7176/jiea/9-5-04","DOIUrl":null,"url":null,"abstract":"Many real world applications inevitably contain datasets that have multiclass structure characterized by imbalance classes, redundant and irrelevant features that degrade performance of classifiers. Minority classes in the datasets are treated as outliers’ classes. The research aimed at establishing the role of ensemble technique in improving performance of multiclass classification. Multiclass datasets were transformed to binary and the datasets resampled using Synthetic minority oversampling technique (SMOTE) algorithm. Relevant features of the datasets were selected by use of an ensemble filter method developed using Correlation, Information Gain, Gain-Ratio and ReliefF filter selection methods. Adaboost and Random subspace learning algorithms were combined using Voting methodology utilizing random forest as the base classifier. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naive bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established that ensemble technique, resampling datasets and decomposing multiclass results in an improved classification performance as well as enhanced detection of minority outlier (rare) classes. Keywords: Multiclass, Classification, Outliers, Ensemble, Learning Algorithm DOI : 10.7176/JIEA/9-5-04 Publication date : August 31 st 2019","PeriodicalId":440930,"journal":{"name":"Journal of Information Engineering and Applications","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7176/jiea/9-5-04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

Abstract

Many real world applications inevitably contain datasets that have multiclass structure characterized by imbalance classes, redundant and irrelevant features that degrade performance of classifiers. Minority classes in the datasets are treated as outliers’ classes. The research aimed at establishing the role of ensemble technique in improving performance of multiclass classification. Multiclass datasets were transformed to binary and the datasets resampled using Synthetic minority oversampling technique (SMOTE) algorithm. Relevant features of the datasets were selected by use of an ensemble filter method developed using Correlation, Information Gain, Gain-Ratio and ReliefF filter selection methods. Adaboost and Random subspace learning algorithms were combined using Voting methodology utilizing random forest as the base classifier. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naive bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established that ensemble technique, resampling datasets and decomposing multiclass results in an improved classification performance as well as enhanced detection of minority outlier (rare) classes. Keywords: Multiclass, Classification, Outliers, Ensemble, Learning Algorithm DOI : 10.7176/JIEA/9-5-04 Publication date : August 31 st 2019

查看原文本刊更多论文

集成技术在多类分类中的应用

许多现实世界的应用程序不可避免地包含具有多类结构的数据集，其特征是不平衡的类、冗余和不相关的特征，这些特征会降低分类器的性能。数据集中的少数类被视为离群值类。本研究旨在建立集成技术在提高多类分类性能中的作用。将多类数据集转换为二值，采用合成少数派过采样技术(SMOTE)对数据集进行重采样。利用Correlation、Information Gain、Gain- ratio和ReliefF滤波器选择方法开发的集成滤波方法选择数据集的相关特征。采用投票方法结合Adaboost和Random子空间学习算法，以随机森林为基础分类器。分类器采用10倍分层交叉验证进行评估。该模型在多类问题的异常点检测和分类预测方面表现出较好的性能。该模型优于朴素贝叶斯(Naive bayes)、KNN、Bagging、JRipper、决策树(Decision trees)、随机树(RandomTree)和随机森林(Random forest)等其他知名的分类和离群点检测算法。研究结果表明，集成技术、重采样数据集和多类分解可以提高分类性能，并增强对少数异常(罕见)类的检测。关键词:多类，分类，异常值，集成，学习算法DOI: 10.7176/JIEA/9-5-04出版日期:2019年8月31日

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Engineering and Applications

自引率

0.00%

发文量