Using Diversity for Classifier Ensemble Pruning: An Empirical Investigation

Theoretical and Applied Informatics Pub Date : 2018-03-19 DOI:10.20904/291-2025

M. A. O. Ahmed, Luca Didaci, Bahram Lavi, G. Fumera

{"title":"Using Diversity for Classifier Ensemble Pruning: An Empirical Investigation","authors":"M. A. O. Ahmed, Luca Didaci, Bahram Lavi, G. Fumera","doi":"10.20904/291-2025","DOIUrl":null,"url":null,"abstract":"The concept of `diversity' has been one of the main open issues in the field of multiple classifier systems. In this paper we address a facet of diversity related to its effectiveness for ensemble construction, namely, explicitly using diversity measures for ensemble construction techniques based on the kind of overproduce and choose strategy known as ensemble pruning. Such a strategy consists of selecting the (hopefully) more accurate subset of classifiers out of an original, larger ensemble. Whereas several existing pruning methods use some combination of individual classifiers' accuracy and diversity, it is still unclear whether such an evaluation function is better than the bare estimate of ensemble accuracy. We empirically investigate this issue by comparing two evaluation functions in the context of ensemble pruning: the estimate of ensemble accuracy, and its linear combination with several well-known diversity measures. This can also be viewed as using diversity as a regularizer, as suggested by some authors. To this aim we use a pruning method based on forward selection, since it allows a direct comparison between different evaluation functions. Experiments on thirty-seven benchmark data sets, four diversity measures and three base classifiers provide evidence that using diversity measures for ensemble pruning can be advantageous over using only ensemble accuracy, and that diversity measures can act as regularizers in this context.","PeriodicalId":413417,"journal":{"name":"Theoretical and Applied Informatics","volume":"38 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"10","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Theoretical and Applied Informatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.20904/291-2025","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 10

Abstract

The concept of `diversity' has been one of the main open issues in the field of multiple classifier systems. In this paper we address a facet of diversity related to its effectiveness for ensemble construction, namely, explicitly using diversity measures for ensemble construction techniques based on the kind of overproduce and choose strategy known as ensemble pruning. Such a strategy consists of selecting the (hopefully) more accurate subset of classifiers out of an original, larger ensemble. Whereas several existing pruning methods use some combination of individual classifiers' accuracy and diversity, it is still unclear whether such an evaluation function is better than the bare estimate of ensemble accuracy. We empirically investigate this issue by comparing two evaluation functions in the context of ensemble pruning: the estimate of ensemble accuracy, and its linear combination with several well-known diversity measures. This can also be viewed as using diversity as a regularizer, as suggested by some authors. To this aim we use a pruning method based on forward selection, since it allows a direct comparison between different evaluation functions. Experiments on thirty-seven benchmark data sets, four diversity measures and three base classifiers provide evidence that using diversity measures for ensemble pruning can be advantageous over using only ensemble accuracy, and that diversity measures can act as regularizers in this context.

查看原文本刊更多论文

利用多样性进行分类器集成剪枝的实证研究

“多样性”的概念一直是多分类系统领域的主要开放性问题之一。在本文中，我们讨论了与集成构建有效性相关的多样性的一个方面，即明确地使用基于过度生产和选择策略(称为集成修剪)的集成构建技术的多样性度量。这种策略包括从原始的、更大的集合中选择(希望)更准确的分类器子集。尽管现有的几种修剪方法使用了单个分类器的精度和多样性的某种组合，但这种评估函数是否优于单纯的集成精度估计，目前还不清楚。我们通过比较集成剪枝中的两个评价函数:集成精度的估计，以及它与几个众所周知的多样性度量的线性组合，对这一问题进行了实证研究。正如一些作者所建议的那样，这也可以被视为使用多样性作为正则化器。为此，我们使用基于前向选择的修剪方法，因为它允许在不同的评估函数之间进行直接比较。在37个基准数据集、4个多样性度量和3个基本分类器上的实验证明，使用多样性度量进行集成修剪比只使用集成精度更有利，并且多样性度量在这种情况下可以充当正则化器。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Theoretical and Applied Informatics

自引率

0.00%

发文量