Speeding up AdaBoost Classifier with Random Projection

2009 Seventh International Conference on Advances in Pattern Recognition Pub Date : 2009-02-04 DOI:10.1109/ICAPR.2009.67

Biswajit Paul, G. Athithan, M. Murty

引用次数: 18

Abstract

The development of techniques for scaling up classifiers so that they can be applied to problems with large datasets of training examples is one of the objectives of data mining. Recently, AdaBoost has become popular among machine learning community thanks to its promising results across a variety of applications. However, training AdaBoost on large datasets is a major problem, especially when the dimensionality of the data is very high. This paper discusses the effect of high dimensionality on the training process of AdaBoost. Two preprocessing options to reduce dimensionality, namely the principal component analysis and random projection are briefly examined. Random projection subject to a probabilistic length preserving transformation is explored further as a computationally light preprocessing step. The experimental results obtained demonstrate the effectiveness of the proposed training process for handling high dimensional large datasets.

查看原文本刊更多论文

用随机投影加速AdaBoost分类器

数据挖掘的目标之一是开发扩展分类器的技术，使其能够应用于具有训练样例的大型数据集的问题。最近，AdaBoost因其在各种应用程序中的良好效果而在机器学习社区中流行起来。然而，在大型数据集上训练AdaBoost是一个主要问题，特别是当数据的维度非常高时。本文讨论了高维对AdaBoost训练过程的影响。本文简要介绍了两种降低维数的预处理方法，即主成分分析和随机投影。通过概率保长变换，进一步探讨了随机投影作为计算量较小的预处理步骤。实验结果表明，所提出的训练过程对于处理高维大数据集是有效的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2009 Seventh International Conference on Advances in Pattern Recognition

自引率

0.00%

发文量