{"title":"TaSPM: Targeted Sequential Pattern Mining","authors":"Gengsen Huang, Wensheng Gan, Philip S. Yu","doi":"10.1145/3639827","DOIUrl":null,"url":null,"abstract":"<p>Sequential pattern mining (SPM) is an important technique in the field of pattern mining, which has many applications in reality. Although many efficient SPM algorithms have been proposed, there are few studies that can focus on targeted tasks. Targeted querying of the concerned sequential patterns can not only reduce the number of patterns generated, but also increase the efficiency of users in performing related analysis. The current algorithms available for targeted sequence querying are based on specific scenarios and can not be extended to other applications. In this paper, we formulate the problem of targeted sequential pattern mining and propose a generic algorithm, namely TaSPM. What is more, to improve the efficiency of TaSPM on large-scale datasets and multiple-item-based sequence datasets, we propose several pruning strategies to reduce meaningless operations in the mining process. Totally four pruning strategies are designed in TaSPM, and hence TaSPM can terminate unnecessary pattern extensions quickly and achieve better performance. Finally, we conducted extensive experiments on different datasets to compare the baseline SPM algorithm with TaSPM. Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"22 1","pages":""},"PeriodicalIF":4.0000,"publicationDate":"2024-01-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3639827","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Sequential pattern mining (SPM) is an important technique in the field of pattern mining, which has many applications in reality. Although many efficient SPM algorithms have been proposed, there are few studies that can focus on targeted tasks. Targeted querying of the concerned sequential patterns can not only reduce the number of patterns generated, but also increase the efficiency of users in performing related analysis. The current algorithms available for targeted sequence querying are based on specific scenarios and can not be extended to other applications. In this paper, we formulate the problem of targeted sequential pattern mining and propose a generic algorithm, namely TaSPM. What is more, to improve the efficiency of TaSPM on large-scale datasets and multiple-item-based sequence datasets, we propose several pruning strategies to reduce meaningless operations in the mining process. Totally four pruning strategies are designed in TaSPM, and hence TaSPM can terminate unnecessary pattern extensions quickly and achieve better performance. Finally, we conducted extensive experiments on different datasets to compare the baseline SPM algorithm with TaSPM. Experiments show that the novel targeted mining algorithm TaSPM can achieve faster running time and less memory consumption.
期刊介绍:
TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.