Intelligent MapReduce Based Framework for Labeling Instances in Evolving Data Stream

2013 IEEE 5th International Conference on Cloud Computing Technology and Science Pub Date : 2013-12-02 DOI:10.1109/CloudCom.2013.152

Ahsanul Haque, Brandon Parker, L. Khan, B. Thuraisingham

引用次数: 3

Abstract

In our current work, we have proposed a multi-tiered ensemble based robust method to address all of the challenges of labeling instances in evolving data stream. Bottleneck of our current work is, it needs to build ADABOOST ensembles for each of the numeric features. This can face scalability issue as number of features can be very large at times in data stream. In this paper, we propose an intelligent approach to build these large number of ADABOOST ensembles with MapReduce based parallelism. We show that, this approach can help our base method to achieve significant scalability without compromising classification accuracy. We analyze different aspects of our design to depict advantages and disadvantages of the approach. We also compare and analyze performance of the proposed approach in terms of execution time, speedup and scale up.

查看原文本刊更多论文

基于智能MapReduce的演化数据流实例标记框架

在我们目前的工作中，我们提出了一种基于多层集成的鲁棒方法来解决在不断发展的数据流中标记实例的所有挑战。我们目前工作的瓶颈是，它需要为每个数值特征构建ADABOOST集成。这可能会面临可伸缩性问题，因为数据流中的特性数量有时会非常大。在本文中，我们提出了一种基于MapReduce并行性的智能方法来构建这些大量的ADABOOST集成。我们表明，这种方法可以帮助我们的基本方法在不影响分类精度的情况下实现显著的可扩展性。我们分析了设计的不同方面，以描述该方法的优点和缺点。我们还比较和分析了所提出的方法在执行时间、加速和扩展方面的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2013 IEEE 5th International Conference on Cloud Computing Technology and Science

自引率

0.00%

发文量