An Efficient Parallel Algorithm for Mining Both Frequent Closed and Generator Sequences on Multi-core Processors

Hai V. Duong, Tin C. Truong, Bac Le
{"title":"An Efficient Parallel Algorithm for Mining Both Frequent Closed and Generator Sequences on Multi-core Processors","authors":"Hai V. Duong, Tin C. Truong, Bac Le","doi":"10.1109/NICS.2018.8606896","DOIUrl":null,"url":null,"abstract":"Compared to frequent sequence mining that is a computationally challenging task with many intermediate subsequences, frequent closed and generator sequence mining provides several benefits because it results in increased efficiency and concise representations while preserving all the information of all traditional patterns recovered from the representations. Besides, frequent closed sequences can be combined with generators to generate non-redundant sequential rules and to recover all sequential patterns as well as their frequencies quickly. However, most algorithms that have been proposed to discover either closed sequences or generators at a time and for large databases containing many long sequences are still too long to complete the work or run out of memory. Therefore, this paper, by exploiting the advantage of multi-core processor architectures, proposes a novel parallel algorithm called Par-GenCloSM for simultaneously mining both frequent closed and generator sequences in the same process. Par-GenCloSM is based on efficient techniques to quickly eliminate unpromising candidate branches and two novel strategies named EPUCloGen and GPPCloGen to reduce the global synchronization cost of the parallel model and speed up the mining process. Par-GenCloSM is the first parallel algorithm for mining frequent closed sequences and generators concurrently. Experimental results on many real-life and synthetic databases show that Par-GenCloSM outperforms state-of-the-art algorithms in terms of runtime and memory consumption, especially for long sequence databases with low minimum support thresholds.","PeriodicalId":137666,"journal":{"name":"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)","volume":"36 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 5th NAFOSTED Conference on Information and Computer Science (NICS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/NICS.2018.8606896","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Compared to frequent sequence mining that is a computationally challenging task with many intermediate subsequences, frequent closed and generator sequence mining provides several benefits because it results in increased efficiency and concise representations while preserving all the information of all traditional patterns recovered from the representations. Besides, frequent closed sequences can be combined with generators to generate non-redundant sequential rules and to recover all sequential patterns as well as their frequencies quickly. However, most algorithms that have been proposed to discover either closed sequences or generators at a time and for large databases containing many long sequences are still too long to complete the work or run out of memory. Therefore, this paper, by exploiting the advantage of multi-core processor architectures, proposes a novel parallel algorithm called Par-GenCloSM for simultaneously mining both frequent closed and generator sequences in the same process. Par-GenCloSM is based on efficient techniques to quickly eliminate unpromising candidate branches and two novel strategies named EPUCloGen and GPPCloGen to reduce the global synchronization cost of the parallel model and speed up the mining process. Par-GenCloSM is the first parallel algorithm for mining frequent closed sequences and generators concurrently. Experimental results on many real-life and synthetic databases show that Par-GenCloSM outperforms state-of-the-art algorithms in terms of runtime and memory consumption, especially for long sequence databases with low minimum support thresholds.
多核处理器上频繁闭合序列和生成序列的高效并行挖掘算法
频繁的序列挖掘是一项具有许多中间子序列的计算挑战任务,与之相比,频繁的封闭序列和生成器序列挖掘提供了一些好处,因为它可以提高效率和简洁的表示,同时保留从表示中恢复的所有传统模式的所有信息。此外,频繁闭合序列可以与生成器结合,生成非冗余的序列规则,并快速恢复所有序列模式及其频率。然而,对于包含许多长序列的大型数据库,大多数提出的用于一次发现封闭序列或生成器的算法仍然太长而无法完成工作或耗尽内存。因此,本文利用多核处理器架构的优势,提出了一种新的并行算法Par-GenCloSM,用于在同一进程中同时挖掘频繁闭合序列和生成器序列。parg - genclosm基于快速剔除无希望候选分支的高效技术和EPUCloGen和GPPCloGen两种新颖策略,以降低并行模型的全局同步成本,加快挖掘过程。Par-GenCloSM是第一个同时挖掘频繁闭序列和生成器的并行算法。在许多真实数据库和合成数据库上的实验结果表明,Par-GenCloSM在运行时和内存消耗方面优于最先进的算法,特别是对于具有低最小支持阈值的长序列数据库。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信