Intelligent MapReduce Based Framework for Labeling Instances in Evolving Data Stream

Ahsanul Haque, Brandon Parker, L. Khan, B. Thuraisingham
{"title":"Intelligent MapReduce Based Framework for Labeling Instances in Evolving Data Stream","authors":"Ahsanul Haque, Brandon Parker, L. Khan, B. Thuraisingham","doi":"10.1109/CloudCom.2013.152","DOIUrl":null,"url":null,"abstract":"In our current work, we have proposed a multi-tiered ensemble based robust method to address all of the challenges of labeling instances in evolving data stream. Bottleneck of our current work is, it needs to build ADABOOST ensembles for each of the numeric features. This can face scalability issue as number of features can be very large at times in data stream. In this paper, we propose an intelligent approach to build these large number of ADABOOST ensembles with MapReduce based parallelism. We show that, this approach can help our base method to achieve significant scalability without compromising classification accuracy. We analyze different aspects of our design to depict advantages and disadvantages of the approach. We also compare and analyze performance of the proposed approach in terms of execution time, speedup and scale up.","PeriodicalId":198053,"journal":{"name":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2013-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2013 IEEE 5th International Conference on Cloud Computing Technology and Science","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CloudCom.2013.152","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 3

Abstract

In our current work, we have proposed a multi-tiered ensemble based robust method to address all of the challenges of labeling instances in evolving data stream. Bottleneck of our current work is, it needs to build ADABOOST ensembles for each of the numeric features. This can face scalability issue as number of features can be very large at times in data stream. In this paper, we propose an intelligent approach to build these large number of ADABOOST ensembles with MapReduce based parallelism. We show that, this approach can help our base method to achieve significant scalability without compromising classification accuracy. We analyze different aspects of our design to depict advantages and disadvantages of the approach. We also compare and analyze performance of the proposed approach in terms of execution time, speedup and scale up.
基于智能MapReduce的演化数据流实例标记框架
在我们目前的工作中,我们提出了一种基于多层集成的鲁棒方法来解决在不断发展的数据流中标记实例的所有挑战。我们目前工作的瓶颈是,它需要为每个数值特征构建ADABOOST集成。这可能会面临可伸缩性问题,因为数据流中的特性数量有时会非常大。在本文中,我们提出了一种基于MapReduce并行性的智能方法来构建这些大量的ADABOOST集成。我们表明,这种方法可以帮助我们的基本方法在不影响分类精度的情况下实现显著的可扩展性。我们分析了设计的不同方面,以描述该方法的优点和缺点。我们还比较和分析了所提出的方法在执行时间、加速和扩展方面的性能。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信