An expanded prefix tree-based mining algorithm for sequential pattern maintenance with deletions

Hoang Thi Hong Van, Vo Thi Ngoc Chau, N. H. Phung
{"title":"An expanded prefix tree-based mining algorithm for sequential pattern maintenance with deletions","authors":"Hoang Thi Hong Van, Vo Thi Ngoc Chau, N. H. Phung","doi":"10.1109/ICITISEE.2017.8285476","DOIUrl":null,"url":null,"abstract":"Sequential pattern mining is an important mining task for discovering sequential patterns along with their insight relationships in many real-world applications. In practice, sequence databases are kept changing over the time along with their business. For some reasons, some sequences in the database are asked to be deleted from the database. In order to have a synchronization of discovered sequential patterns with the database from which they have been discovered, the sequential pattern mining task is re-considered with many challenges. As the number of deleted sequences is often smaller than the size of the entire database, re-mining from scratch the updated database might incur a high cost because sequential pattern mining is a computationally expensive task. In this paper, our work aims at an efficient incremental mining solution to the sequential pattern mining task with sequence deletions. Different from the existing works, we propose an expanded prefix tree by extending the existing prefix tree with additional structures for capturing more necessary information for the incremental mining process. Based on this tree, we propose an incremental sequential pattern mining algorithm, SPMD, for finding a complete set of sequential patterns with no re-scanning the original database, when a number of sequences in the database are deleted. Experimental results on the benchmark databases have confirmed that our SPMD algorithm outperforms the re-mining from scratch with the PrefixSpan algorithm with less running time.","PeriodicalId":130873,"journal":{"name":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE.2017.8285476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

Sequential pattern mining is an important mining task for discovering sequential patterns along with their insight relationships in many real-world applications. In practice, sequence databases are kept changing over the time along with their business. For some reasons, some sequences in the database are asked to be deleted from the database. In order to have a synchronization of discovered sequential patterns with the database from which they have been discovered, the sequential pattern mining task is re-considered with many challenges. As the number of deleted sequences is often smaller than the size of the entire database, re-mining from scratch the updated database might incur a high cost because sequential pattern mining is a computationally expensive task. In this paper, our work aims at an efficient incremental mining solution to the sequential pattern mining task with sequence deletions. Different from the existing works, we propose an expanded prefix tree by extending the existing prefix tree with additional structures for capturing more necessary information for the incremental mining process. Based on this tree, we propose an incremental sequential pattern mining algorithm, SPMD, for finding a complete set of sequential patterns with no re-scanning the original database, when a number of sequences in the database are deleted. Experimental results on the benchmark databases have confirmed that our SPMD algorithm outperforms the re-mining from scratch with the PrefixSpan algorithm with less running time.
一种扩展的前缀树挖掘算法,用于带删除的顺序模式维护
在许多实际应用程序中,顺序模式挖掘是发现顺序模式及其洞察关系的重要挖掘任务。在实践中,序列数据库随着它们的业务不断变化。由于某些原因,数据库中的一些序列被要求从数据库中删除。为了使发现的顺序模式与发现顺序模式的数据库同步,顺序模式挖掘任务面临许多挑战。由于删除序列的数量通常小于整个数据库的大小,因此从头开始重新挖掘更新的数据库可能会产生很高的成本,因为顺序模式挖掘是一项计算成本很高的任务。在本文中,我们的工作旨在为具有序列删除的顺序模式挖掘任务提供一种有效的增量挖掘解决方案。与现有的工作不同,我们提出了一个扩展的前缀树,通过扩展现有的前缀树,为增量挖掘过程捕获更多必要的信息。在此基础上,提出了一种增量式序列模式挖掘算法SPMD,当数据库中大量序列被删除时,无需重新扫描原始数据库即可找到完整的序列模式集。在基准数据库上的实验结果表明,SPMD算法的性能优于PrefixSpan算法,且运行时间更短。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信