TaSPM: Targeted Sequential Pattern Mining

IF 4.8 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-01-19 DOI:10.1145/3639827

Gengsen Huang, Wensheng Gan, Philip S. Yu

{"title":"TaSPM: Targeted Sequential Pattern Mining","authors":"Gengsen Huang, Wensheng Gan, Philip S. Yu","doi":"10.1145/3639827","DOIUrl":null,"url":null,"abstract":"<p>Sequential pattern mining (SPM) is an important technique in the field of pattern mining, which has many applications in reality. Although many efficient SPM algorithms have been proposed, there are few studies that can focus on targeted tasks. Targeted querying of the concerned sequential patterns can not only reduce the number of patterns generated, but also increase the efficiency of users in performing related analysis. The current algorithms available for targeted sequence querying are based on specific scenarios and can not be extended to other applications. In this paper, we formulate the problem of targeted sequential pattern mining and propose a generic algorithm, namely TaSPM. What is more, to improve the efficiency of TaSPM on large-scale datasets and multiple-item-based sequence datasets, we propose several pruning strategies to reduce meaningless operations in the mining process. Totally four pruning strategies are designed in TaSPM, and hence TaSPM can terminate unnecessary pattern extensions quickly and achieve better performance. Finally, we conducted extensive experiments on different datasets to compare the baseline SPM algorithm with TaSPM. Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"22 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3639827","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Sequential pattern mining (SPM) is an important technique in the field of pattern mining, which has many applications in reality. Although many efficient SPM algorithms have been proposed, there are few studies that can focus on targeted tasks. Targeted querying of the concerned sequential patterns can not only reduce the number of patterns generated, but also increase the efficiency of users in performing related analysis. The current algorithms available for targeted sequence querying are based on specific scenarios and can not be extended to other applications. In this paper, we formulate the problem of targeted sequential pattern mining and propose a generic algorithm, namely TaSPM. What is more, to improve the efficiency of TaSPM on large-scale datasets and multiple-item-based sequence datasets, we propose several pruning strategies to reduce meaningless operations in the mining process. Totally four pruning strategies are designed in TaSPM, and hence TaSPM can terminate unnecessary pattern extensions quickly and achieve better performance. Finally, we conducted extensive experiments on different datasets to compare the baseline SPM algorithm with TaSPM. Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.

查看原文本刊更多论文

TaSPM：目标序列模式挖掘

序列模式挖掘（SPM）是模式挖掘领域的一项重要技术，在现实中有很多应用。虽然已经提出了很多高效的 SPM 算法，但能专注于目标任务的研究却很少。对相关的序列模式进行有针对性的查询，不仅可以减少生成模式的数量，还能提高用户进行相关分析的效率。目前现有的有针对性的序列查询算法都是基于特定场景的，无法扩展到其他应用中。本文提出了有针对性的序列模式挖掘问题，并提出了一种通用算法，即 TaSPM。此外，为了提高 TaSPM 在大规模数据集和基于多项目的序列数据集上的效率，我们提出了几种剪枝策略，以减少挖掘过程中的无意义操作。TaSPM 总共设计了四种剪枝策略，因此 TaSPM 可以快速终止不必要的模式扩展，并获得更好的性能。最后，我们在不同的数据集上进行了大量实验，比较了基线 SPM 算法和 TaSPM 算法。实验结果表明，新颖的定向挖掘算法 TaSPM 可以实现更快的运行时间和更少的内存消耗。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ACM Transactions on Knowledge Discovery from Data COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

6.70

自引率

5.60%

发文量

172

审稿时长

3 months

期刊介绍： TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.