SHE:基于滑动窗口的数据流挖掘的通用框架

Yuhan Wu, Zhuochen Fan, Qilong Shi, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, A. Shtul, Yaofeng Tu
{"title":"SHE:基于滑动窗口的数据流挖掘的通用框架","authors":"Yuhan Wu, Zhuochen Fan, Qilong Shi, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, A. Shtul, Yaofeng Tu","doi":"10.1145/3545008.3545009","DOIUrl":null,"url":null,"abstract":"1Data stream mining over a sliding window is a fundamental problem in many applications, such as financial data trackers, intrusion detection and QoS. To meet the demand for high throughput of high speed data streams, sliding window algorithms turn to hardware platforms including FPGA/ASIC and programmable switches. These hardware platforms have three constraints for algorithms running on, which are 1) small memory usage 2) single stage memory access and 3) limited concurrent memory access. Algorithms perfectly fit in with these constraints will enable a highest utilization of these hardware platforms. However, no existing sliding window algorithm is specifically designed for hardware platforms. In this paper, we propose the Sliding Hardware Estimator (SHE), which is a generic framework that extends existing fixed window algorithms to sliding windows on hardware platforms. The key idea of SHE is that, during insertions we approximately delete out-dated information with little time and space overhead, while during queries we design sophisticated techniques to minimize error. We have fully implemented our SHE on FPGA, achieving a throughput of 544 Mips. We apply SHE to four typical data stream mining tasks. Experimental results show that, when compared with the state-of-the-art which cannot be implemented in hardware, SHE reduces the error by up to 100 times in membership queries. All related source codes are released at Github.","PeriodicalId":360504,"journal":{"name":"Proceedings of the 51st International Conference on Parallel Processing","volume":"7 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-08-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"SHE: A Generic Framework for Data Stream Mining over Sliding Windows\",\"authors\":\"Yuhan Wu, Zhuochen Fan, Qilong Shi, Yixin Zhang, Tong Yang, Cheng Chen, Zheng Zhong, Junnan Li, A. Shtul, Yaofeng Tu\",\"doi\":\"10.1145/3545008.3545009\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"1Data stream mining over a sliding window is a fundamental problem in many applications, such as financial data trackers, intrusion detection and QoS. To meet the demand for high throughput of high speed data streams, sliding window algorithms turn to hardware platforms including FPGA/ASIC and programmable switches. These hardware platforms have three constraints for algorithms running on, which are 1) small memory usage 2) single stage memory access and 3) limited concurrent memory access. Algorithms perfectly fit in with these constraints will enable a highest utilization of these hardware platforms. However, no existing sliding window algorithm is specifically designed for hardware platforms. In this paper, we propose the Sliding Hardware Estimator (SHE), which is a generic framework that extends existing fixed window algorithms to sliding windows on hardware platforms. The key idea of SHE is that, during insertions we approximately delete out-dated information with little time and space overhead, while during queries we design sophisticated techniques to minimize error. We have fully implemented our SHE on FPGA, achieving a throughput of 544 Mips. We apply SHE to four typical data stream mining tasks. Experimental results show that, when compared with the state-of-the-art which cannot be implemented in hardware, SHE reduces the error by up to 100 times in membership queries. All related source codes are released at Github.\",\"PeriodicalId\":360504,\"journal\":{\"name\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"volume\":\"7 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-08-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 51st International Conference on Parallel Processing\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3545008.3545009\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 51st International Conference on Parallel Processing","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3545008.3545009","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5

摘要

在滑动窗口上挖掘数据流是许多应用中的一个基本问题,例如金融数据跟踪、入侵检测和QoS。为了满足高速数据流的高吞吐量需求,滑动窗口算法转向硬件平台,包括FPGA/ASIC和可编程开关。这些硬件平台对算法的运行有三个限制:1)内存使用量小;2)单阶段内存访问;3)有限的并发内存访问。完全符合这些约束的算法将使这些硬件平台的利用率达到最高。但是,目前还没有针对硬件平台专门设计的滑动窗口算法。在本文中,我们提出了滑动硬件估计器(SHE),它是一个将现有固定窗口算法扩展到硬件平台上滑动窗口的通用框架。SHE的关键思想是,在插入期间,我们以很少的时间和空间开销大致删除过时的信息,而在查询期间,我们设计复杂的技术以尽量减少错误。我们已经在FPGA上完全实现了我们的SHE,实现了544 Mips的吞吐量。我们将SHE应用于四种典型的数据流挖掘任务。实验结果表明,与无法在硬件上实现的最先进方法相比,该方法将成员查询的误差降低了100倍。所有相关的源代码都在Github上发布。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
SHE: A Generic Framework for Data Stream Mining over Sliding Windows
1Data stream mining over a sliding window is a fundamental problem in many applications, such as financial data trackers, intrusion detection and QoS. To meet the demand for high throughput of high speed data streams, sliding window algorithms turn to hardware platforms including FPGA/ASIC and programmable switches. These hardware platforms have three constraints for algorithms running on, which are 1) small memory usage 2) single stage memory access and 3) limited concurrent memory access. Algorithms perfectly fit in with these constraints will enable a highest utilization of these hardware platforms. However, no existing sliding window algorithm is specifically designed for hardware platforms. In this paper, we propose the Sliding Hardware Estimator (SHE), which is a generic framework that extends existing fixed window algorithms to sliding windows on hardware platforms. The key idea of SHE is that, during insertions we approximately delete out-dated information with little time and space overhead, while during queries we design sophisticated techniques to minimize error. We have fully implemented our SHE on FPGA, achieving a throughput of 544 Mips. We apply SHE to four typical data stream mining tasks. Experimental results show that, when compared with the state-of-the-art which cannot be implemented in hardware, SHE reduces the error by up to 100 times in membership queries. All related source codes are released at Github.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信