An Ensemble Model for Multiclass Classification and Outlier Detection Method in Data Mining

Journal of Information Engineering and Applications Pub Date : 1900-01-01 DOI:10.7176/jiea/9-2-04

Dalton Ndirangu, W. Mwangi, L. Nderu

{"title":"An Ensemble Model for Multiclass Classification and Outlier Detection Method in Data Mining","authors":"Dalton Ndirangu, W. Mwangi, L. Nderu","doi":"10.7176/jiea/9-2-04","DOIUrl":null,"url":null,"abstract":"Real life world datasets exhibit a multiclass classification structure characterized by imbalance classes. Minority classes are treated as outliers’ classes. The study used cross-industry process for data mining methodology. A heterogeneous multiclass ensemble was developed by combining several strategies and ensemble techniques. The datasets used were drawn from UCI machine learning repository. Experiments for validating the model were conducted and represented in form of tables and figures. An ensemble filter selection method was developed and used for preprocessing datasets. Point-outliers were filtered using Inter quartile range filter algorithm. Datasets were resampled using Synthetic minority oversampling technique (SMOTE) algorithm. Multiclass datasets were transformed to binary classes using OnevsOne decomposing technique. An Ensemble model was developed using adaboost and random subspace algorithms utilizing random forest as the base classifier. The classifiers built were combined using voting methodology. The model was validated with classification and outlier metric performance measures such as Recall, Precision, F-measure and AUCROC values. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naive bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established ensemble techniques, resampling datasets and decomposing multiclass results in an improved detection of minority outlier (rare) classes. Keywords: Multiclass, Outlier, Ensemble, Model, Classification DOI : 10.7176/JIEA/9-2-04 Publication date : April 30 th 2019","PeriodicalId":440930,"journal":{"name":"Journal of Information Engineering and Applications","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Information Engineering and Applications","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.7176/jiea/9-2-04","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

Abstract

Real life world datasets exhibit a multiclass classification structure characterized by imbalance classes. Minority classes are treated as outliers’ classes. The study used cross-industry process for data mining methodology. A heterogeneous multiclass ensemble was developed by combining several strategies and ensemble techniques. The datasets used were drawn from UCI machine learning repository. Experiments for validating the model were conducted and represented in form of tables and figures. An ensemble filter selection method was developed and used for preprocessing datasets. Point-outliers were filtered using Inter quartile range filter algorithm. Datasets were resampled using Synthetic minority oversampling technique (SMOTE) algorithm. Multiclass datasets were transformed to binary classes using OnevsOne decomposing technique. An Ensemble model was developed using adaboost and random subspace algorithms utilizing random forest as the base classifier. The classifiers built were combined using voting methodology. The model was validated with classification and outlier metric performance measures such as Recall, Precision, F-measure and AUCROC values. The classifiers were evaluated using 10 fold stratified cross validation. The model showed better performance in terms of outlier detection and classification prediction for multiclass problem. The model outperformed other well-known existing classification and outlier detection algorithms such as Naive bayes, KNN, Bagging, JRipper, Decision trees, RandomTree and Random forest. The study findings established ensemble techniques, resampling datasets and decomposing multiclass results in an improved detection of minority outlier (rare) classes. Keywords: Multiclass, Outlier, Ensemble, Model, Classification DOI : 10.7176/JIEA/9-2-04 Publication date : April 30 th 2019

查看原文本刊更多论文

数据挖掘中多类分类的集成模型及离群点检测方法

现实世界的数据集表现出以不平衡类为特征的多类分类结构。少数族裔阶层被视为异常阶层。本研究采用跨行业流程的数据挖掘方法。结合多种集成策略和集成技术，建立了异构多类集成系统。使用的数据集来自UCI机器学习存储库。进行了验证模型的实验，并以表格和图形的形式表示。提出了一种集成滤波器选择方法，并将其用于数据集预处理。采用四分位间距离滤波算法对点异常值进行滤波。使用合成少数过采样技术(SMOTE)算法对数据集进行重采样。使用OnevsOne分解技术将多类数据集转换为二进制类。以随机森林为基础分类器，采用adaboost和随机子空间算法建立了集成模型。构建的分类器使用投票方法进行组合。采用召回率、精度、f -测度和AUCROC值等分类和离群度量性能指标对模型进行验证。分类器采用10倍分层交叉验证进行评估。该模型在多类问题的异常点检测和分类预测方面表现出较好的性能。该模型优于朴素贝叶斯(Naive bayes)、KNN、Bagging、JRipper、决策树(Decision trees)、随机树(RandomTree)和随机森林(Random forest)等其他知名的分类和离群点检测算法。研究结果建立了集成技术，重新采样数据集和分解多类结果，以改进对少数异常(罕见)类的检测。关键词:Multiclass, Outlier, Ensemble, Model, Classification DOI: 10.7176/JIEA/9-2-04出版日期:2019年4月30日

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Information Engineering and Applications

自引率

0.00%

发文量