How boosting the margin can also boost classifier complexity

Proceedings of the 23rd international conference on Machine learning Pub Date : 2006-06-25 DOI:10.1145/1143844.1143939

L. Reyzin, R. Schapire

引用次数: 238

Abstract

Boosting methods are known not to usually overfit training data even as the size of the generated classifiers becomes large. Schapire et al. attempted to explain this phenomenon in terms of the margins the classifier achieves on training examples. Later, however, Breiman cast serious doubt on this explanation by introducing a boosting algorithm, arc-gv, that can generate a higher margins distribution than AdaBoost and yet performs worse. In this paper, we take a close look at Breiman's compelling but puzzling results. Although we can reproduce his main finding, we find that the poorer performance of arc-gv can be explained by the increased complexity of the base classifiers it uses, an explanation supported by our experiments and entirely consistent with the margins theory. Thus, we find maximizing the margins is desirable, but not necessarily at the expense of other factors, especially base-classifier complexity.

查看原文本刊更多论文

如何提高边际也能提高分类器的复杂度

众所周知，即使生成的分类器变得很大，增强方法通常也不会过度拟合训练数据。Schapire等人试图用分类器在训练样本上获得的边际来解释这种现象。然而，后来Breiman对这一解释提出了严重的质疑，他引入了一种增强算法arc-gv，该算法可以产生比AdaBoost更高的利润率分布，但性能却更差。在本文中，我们仔细研究了Breiman令人信服但令人困惑的结果。虽然我们可以重现他的主要发现，但我们发现arc-gv较差的性能可以通过它使用的基本分类器的复杂性增加来解释，我们的实验支持这一解释，并且与边际理论完全一致。因此，我们发现最大化边际是可取的，但不一定以牺牲其他因素为代价，特别是基分类器的复杂性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 23rd international conference on Machine learning

自引率

0.00%

发文量