An approach for classifying large dataset using ensemble classifiers

2014 4th International Conference on Computer and Knowledge Engineering (ICCKE) Pub Date : 2014-12-22 DOI:10.1109/ICCKE.2014.6993440

Sajad Khodarahmi Jahan Abad, Mohammad-Reza Zare-Mirakabad, M. Rezaeian

{"title":"An approach for classifying large dataset using ensemble classifiers","authors":"Sajad Khodarahmi Jahan Abad, Mohammad-Reza Zare-Mirakabad, M. Rezaeian","doi":"10.1109/ICCKE.2014.6993440","DOIUrl":null,"url":null,"abstract":"Efficiency of general classification models in various problems is different according to the characteristics and the space of the problem. Even in a particular issue, it may not be distinguished a special privilege for a classifier method than the others. Ensemble classifier methods aim to combine the results of several classifiers to cover the deficiency of each classifier by others. This combination faces high computational complexity if it includes a lazy base classifier, especially when handling large datasets. In this paper a method is proposed to combine the results of classifiers, which uses clustering as a part of the training, resulting in reducing the computational complexity, while it provides an acceptable accuracy. In this method the base classifiers are trained by a part of the input dataset, first. Then, according to the labels defined by the base classifiers, the clusters are created for another part of dataset. Finally, the samples contained in the clusters, the cluster that each sample belongs to it, and the distance of each sample to the center of all clusters are given to an artificial neural network and the final class label of test data is determined by the neural network. Experiments on several datasets show advantages of proposed model.","PeriodicalId":152540,"journal":{"name":"2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-12-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE.2014.6993440","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Efficiency of general classification models in various problems is different according to the characteristics and the space of the problem. Even in a particular issue, it may not be distinguished a special privilege for a classifier method than the others. Ensemble classifier methods aim to combine the results of several classifiers to cover the deficiency of each classifier by others. This combination faces high computational complexity if it includes a lazy base classifier, especially when handling large datasets. In this paper a method is proposed to combine the results of classifiers, which uses clustering as a part of the training, resulting in reducing the computational complexity, while it provides an acceptable accuracy. In this method the base classifiers are trained by a part of the input dataset, first. Then, according to the labels defined by the base classifiers, the clusters are created for another part of dataset. Finally, the samples contained in the clusters, the cluster that each sample belongs to it, and the distance of each sample to the center of all clusters are given to an artificial neural network and the final class label of test data is determined by the neural network. Experiments on several datasets show advantages of proposed model.

查看原文本刊更多论文

一种使用集成分类器对大型数据集进行分类的方法

根据问题的特点和空间的不同，一般分类模型在不同问题中的效率也不同。即使在特定问题中，也可能无法区分分类器方法比其他方法具有的特权。集成分类器方法旨在将多个分类器的结果结合起来，以弥补每个分类器的不足。如果包含惰性基分类器，这种组合将面临很高的计算复杂度，特别是在处理大型数据集时。本文提出了一种结合分类器结果的方法，该方法将聚类作为训练的一部分，从而降低了计算复杂度，同时提供了可接受的精度。在该方法中，首先使用一部分输入数据集来训练基分类器。然后，根据基分类器定义的标签，为另一部分数据集创建聚类。最后，将聚类中包含的样本、每个样本所属的聚类以及每个样本到所有聚类中心的距离交给人工神经网络，并由神经网络确定测试数据的最终类标号。在多个数据集上的实验表明了该模型的优越性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2014 4th International Conference on Computer and Knowledge Engineering (ICCKE)

自引率

0.00%

发文量