{"title":"Order, subset construction and sequential pattern mining","authors":"Slimane Oulad-Naoui , Hadda Cherroun , Djelloul Ziadi","doi":"10.1016/j.ins.2025.122348","DOIUrl":null,"url":null,"abstract":"<div><div>Sequential Pattern Mining (SPM) is a basic task in data mining. It aims to extract the most occurring sequences in a dataset, which turns out to be instrumental in many fields. In <span><span>[1]</span></span> we initiated an attempt to formally unify leading pattern mining approaches. This paper builds upon our previous work to first extend the polynomial model to SPM. Next, we devise an efficient implementation termed WASMA that enhances the standard subset construction method. To do so, we first partition the set of states into independent sets based on their labels, and then define three different state ordering. The first is a global id-based order which we use in global exploration. The second is local and used in itemset extension. A geometric ordering is lastly exploited to avoid redundant computations. To handle the memory bottleneck of the determinization, we propose two variants: WASMA-wsc and WASMA-ssc that rely or not on the state existence check clause. Unlike existing approaches that overlook the appearance of repetitive computation paths, the first variant introduces a novel feature, since it avoids recomputing previously explored sub-branches of the problem space. Besides, we refine for the SPM setting the well-known theoretical upper-bound by establishing new complexities in function of the geometric-order topology. Evaluations demonstrate that our solution outperforms existing approaches for SPM instances with very low support thresholds, persisting sole to yield the result while its competitors hit the time limit.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"717 ","pages":"Article 122348"},"PeriodicalIF":8.1000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525004803","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0
Abstract
Sequential Pattern Mining (SPM) is a basic task in data mining. It aims to extract the most occurring sequences in a dataset, which turns out to be instrumental in many fields. In [1] we initiated an attempt to formally unify leading pattern mining approaches. This paper builds upon our previous work to first extend the polynomial model to SPM. Next, we devise an efficient implementation termed WASMA that enhances the standard subset construction method. To do so, we first partition the set of states into independent sets based on their labels, and then define three different state ordering. The first is a global id-based order which we use in global exploration. The second is local and used in itemset extension. A geometric ordering is lastly exploited to avoid redundant computations. To handle the memory bottleneck of the determinization, we propose two variants: WASMA-wsc and WASMA-ssc that rely or not on the state existence check clause. Unlike existing approaches that overlook the appearance of repetitive computation paths, the first variant introduces a novel feature, since it avoids recomputing previously explored sub-branches of the problem space. Besides, we refine for the SPM setting the well-known theoretical upper-bound by establishing new complexities in function of the geometric-order topology. Evaluations demonstrate that our solution outperforms existing approaches for SPM instances with very low support thresholds, persisting sole to yield the result while its competitors hit the time limit.
期刊介绍:
Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions.
Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.