顺序模式挖掘与微米自动机处理器

Ke Wang, Elaheh Sadredini, K. Skadron
{"title":"顺序模式挖掘与微米自动机处理器","authors":"Ke Wang, Elaheh Sadredini, K. Skadron","doi":"10.1145/2903150.2903172","DOIUrl":null,"url":null,"abstract":"Sequential pattern mining (SPM) is a widely used data mining technique for discovering common sequences of events in large databases. When compared with the simple set mining problem and string mining problem, the hierarchical structure of sequential pattern mining (due to the need to consider frequent subsets within each itemset, as well as order among itemsets) and the resulting large permutation space makes SPM extremely expensive on conventional processor architectures. We propose a hardware-accelerated solution of the SPM using Micron's Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs). The Generalized Sequential Pattern (GSP) algorithm for SPM searching exposes massive parallelism, and is therefore well-suited for AP acceleration. We implement the multi-pass pruning strategy of the GSP via the AP's fast reconfigurability. A generalized automaton structure is proposed by flattening sequential patterns to simple strings to reduce compilation time and to minimize overhead of reconfiguration. Up to 90X and 29X speedups are achieved by the AP-accelerated GSP on six real-world datasets, when compared with the optimized multicore CPU and GPU GSP implementations, respectively. The proposed CPU-AP solution also outperforms the state-of-the-art PrefixSpan and SPADE algorithms on multicore CPU by up to 452X and 49X speedups. The AP advantage grows further with larger datasets.","PeriodicalId":226569,"journal":{"name":"Proceedings of the ACM International Conference on Computing Frontiers","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2016-05-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Sequential pattern mining with the Micron automata processor\",\"authors\":\"Ke Wang, Elaheh Sadredini, K. Skadron\",\"doi\":\"10.1145/2903150.2903172\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequential pattern mining (SPM) is a widely used data mining technique for discovering common sequences of events in large databases. When compared with the simple set mining problem and string mining problem, the hierarchical structure of sequential pattern mining (due to the need to consider frequent subsets within each itemset, as well as order among itemsets) and the resulting large permutation space makes SPM extremely expensive on conventional processor architectures. We propose a hardware-accelerated solution of the SPM using Micron's Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs). The Generalized Sequential Pattern (GSP) algorithm for SPM searching exposes massive parallelism, and is therefore well-suited for AP acceleration. We implement the multi-pass pruning strategy of the GSP via the AP's fast reconfigurability. A generalized automaton structure is proposed by flattening sequential patterns to simple strings to reduce compilation time and to minimize overhead of reconfiguration. Up to 90X and 29X speedups are achieved by the AP-accelerated GSP on six real-world datasets, when compared with the optimized multicore CPU and GPU GSP implementations, respectively. The proposed CPU-AP solution also outperforms the state-of-the-art PrefixSpan and SPADE algorithms on multicore CPU by up to 452X and 49X speedups. The AP advantage grows further with larger datasets.\",\"PeriodicalId\":226569,\"journal\":{\"name\":\"Proceedings of the ACM International Conference on Computing Frontiers\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-05-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACM International Conference on Computing Frontiers\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2903150.2903172\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACM International Conference on Computing Frontiers","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2903150.2903172","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 43

摘要

顺序模式挖掘(SPM)是一种广泛使用的数据挖掘技术,用于发现大型数据库中的公共事件序列。与简单的集合挖掘问题和字符串挖掘问题相比,顺序模式挖掘的层次结构(由于需要考虑每个项目集中的频繁子集以及项目集之间的顺序)和由此产生的大排列空间使得SPM在传统的处理器体系结构上非常昂贵。我们提出了一种使用美光自动机处理器(AP)的SPM硬件加速解决方案,AP是一种非确定性有限自动机(nfa)的硬件实现。用于SPM搜索的广义顺序模式(GSP)算法暴露了大量并行性,因此非常适合于AP加速。利用AP的快速可重构性,实现了GSP的多通道剪枝策略。通过将顺序模式平展为简单的字符串,提出了一种通用的自动机结构,以减少编译时间和减少重新配置的开销。与优化的多核CPU和GPU GSP实现相比,ap加速的GSP在六个真实数据集上分别实现了高达90倍和29倍的速度提升。所提出的CPU- ap解决方案还比多核CPU上最先进的PrefixSpan和SPADE算法的速度提高了452X和49X。随着数据集的扩大,AP的优势会进一步增强。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Sequential pattern mining with the Micron automata processor
Sequential pattern mining (SPM) is a widely used data mining technique for discovering common sequences of events in large databases. When compared with the simple set mining problem and string mining problem, the hierarchical structure of sequential pattern mining (due to the need to consider frequent subsets within each itemset, as well as order among itemsets) and the resulting large permutation space makes SPM extremely expensive on conventional processor architectures. We propose a hardware-accelerated solution of the SPM using Micron's Automata Processor (AP), a hardware implementation of non-deterministic finite automata (NFAs). The Generalized Sequential Pattern (GSP) algorithm for SPM searching exposes massive parallelism, and is therefore well-suited for AP acceleration. We implement the multi-pass pruning strategy of the GSP via the AP's fast reconfigurability. A generalized automaton structure is proposed by flattening sequential patterns to simple strings to reduce compilation time and to minimize overhead of reconfiguration. Up to 90X and 29X speedups are achieved by the AP-accelerated GSP on six real-world datasets, when compared with the optimized multicore CPU and GPU GSP implementations, respectively. The proposed CPU-AP solution also outperforms the state-of-the-art PrefixSpan and SPADE algorithms on multicore CPU by up to 452X and 49X speedups. The AP advantage grows further with larger datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信