顺序、子集构造和顺序模式挖掘

IF 8.1 1区 计算机科学 0 COMPUTER SCIENCE, INFORMATION SYSTEMS
Slimane Oulad-Naoui , Hadda Cherroun , Djelloul Ziadi
{"title":"顺序、子集构造和顺序模式挖掘","authors":"Slimane Oulad-Naoui ,&nbsp;Hadda Cherroun ,&nbsp;Djelloul Ziadi","doi":"10.1016/j.ins.2025.122348","DOIUrl":null,"url":null,"abstract":"<div><div>Sequential Pattern Mining (SPM) is a basic task in data mining. It aims to extract the most occurring sequences in a dataset, which turns out to be instrumental in many fields. In <span><span>[1]</span></span> we initiated an attempt to formally unify leading pattern mining approaches. This paper builds upon our previous work to first extend the polynomial model to SPM. Next, we devise an efficient implementation termed WASMA that enhances the standard subset construction method. To do so, we first partition the set of states into independent sets based on their labels, and then define three different state ordering. The first is a global id-based order which we use in global exploration. The second is local and used in itemset extension. A geometric ordering is lastly exploited to avoid redundant computations. To handle the memory bottleneck of the determinization, we propose two variants: WASMA-wsc and WASMA-ssc that rely or not on the state existence check clause. Unlike existing approaches that overlook the appearance of repetitive computation paths, the first variant introduces a novel feature, since it avoids recomputing previously explored sub-branches of the problem space. Besides, we refine for the SPM setting the well-known theoretical upper-bound by establishing new complexities in function of the geometric-order topology. Evaluations demonstrate that our solution outperforms existing approaches for SPM instances with very low support thresholds, persisting sole to yield the result while its competitors hit the time limit.</div></div>","PeriodicalId":51063,"journal":{"name":"Information Sciences","volume":"717 ","pages":"Article 122348"},"PeriodicalIF":8.1000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Order, subset construction and sequential pattern mining\",\"authors\":\"Slimane Oulad-Naoui ,&nbsp;Hadda Cherroun ,&nbsp;Djelloul Ziadi\",\"doi\":\"10.1016/j.ins.2025.122348\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Sequential Pattern Mining (SPM) is a basic task in data mining. It aims to extract the most occurring sequences in a dataset, which turns out to be instrumental in many fields. In <span><span>[1]</span></span> we initiated an attempt to formally unify leading pattern mining approaches. This paper builds upon our previous work to first extend the polynomial model to SPM. Next, we devise an efficient implementation termed WASMA that enhances the standard subset construction method. To do so, we first partition the set of states into independent sets based on their labels, and then define three different state ordering. The first is a global id-based order which we use in global exploration. The second is local and used in itemset extension. A geometric ordering is lastly exploited to avoid redundant computations. To handle the memory bottleneck of the determinization, we propose two variants: WASMA-wsc and WASMA-ssc that rely or not on the state existence check clause. Unlike existing approaches that overlook the appearance of repetitive computation paths, the first variant introduces a novel feature, since it avoids recomputing previously explored sub-branches of the problem space. Besides, we refine for the SPM setting the well-known theoretical upper-bound by establishing new complexities in function of the geometric-order topology. Evaluations demonstrate that our solution outperforms existing approaches for SPM instances with very low support thresholds, persisting sole to yield the result while its competitors hit the time limit.</div></div>\",\"PeriodicalId\":51063,\"journal\":{\"name\":\"Information Sciences\",\"volume\":\"717 \",\"pages\":\"Article 122348\"},\"PeriodicalIF\":8.1000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Sciences\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0020025525004803\",\"RegionNum\":1,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"0\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Sciences","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0020025525004803","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

摘要

序列模式挖掘(SPM)是数据挖掘中的一项基本任务。它旨在提取数据集中出现次数最多的序列,这在许多领域都是有用的。在b[1]中,我们开始尝试正式统一领先的模式挖掘方法。本文在前人工作的基础上,首先将多项式模型扩展到SPM。接下来,我们设计了一个称为WASMA的高效实现,它增强了标准子集构造方法。为此,我们首先根据状态集的标签将其划分为独立的集合,然后定义三种不同的状态排序。第一个是我们在全球探索中使用的基于身份的全球秩序。第二个是本地的,用于项目集扩展。最后利用几何排序来避免冗余计算。为了解决确定的内存瓶颈,我们提出了两种变体:WASMA-wsc和WASMA-ssc,它们依赖或不依赖状态存在检查子句。与忽略重复计算路径的现有方法不同,第一个变体引入了一个新特性,因为它避免了重新计算以前探索过的问题空间的子分支。此外,我们通过建立几何阶拓扑函数的新复杂度来改进SPM的理论上界设置。评估表明,对于支持阈值非常低的SPM实例,我们的解决方案优于现有的方法,当其竞争对手达到时间限制时,我们的解决方案能够持久地产生结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Order, subset construction and sequential pattern mining
Sequential Pattern Mining (SPM) is a basic task in data mining. It aims to extract the most occurring sequences in a dataset, which turns out to be instrumental in many fields. In [1] we initiated an attempt to formally unify leading pattern mining approaches. This paper builds upon our previous work to first extend the polynomial model to SPM. Next, we devise an efficient implementation termed WASMA that enhances the standard subset construction method. To do so, we first partition the set of states into independent sets based on their labels, and then define three different state ordering. The first is a global id-based order which we use in global exploration. The second is local and used in itemset extension. A geometric ordering is lastly exploited to avoid redundant computations. To handle the memory bottleneck of the determinization, we propose two variants: WASMA-wsc and WASMA-ssc that rely or not on the state existence check clause. Unlike existing approaches that overlook the appearance of repetitive computation paths, the first variant introduces a novel feature, since it avoids recomputing previously explored sub-branches of the problem space. Besides, we refine for the SPM setting the well-known theoretical upper-bound by establishing new complexities in function of the geometric-order topology. Evaluations demonstrate that our solution outperforms existing approaches for SPM instances with very low support thresholds, persisting sole to yield the result while its competitors hit the time limit.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Information Sciences
Information Sciences 工程技术-计算机:信息系统
CiteScore
14.00
自引率
17.30%
发文量
1322
审稿时长
10.4 months
期刊介绍: Informatics and Computer Science Intelligent Systems Applications is an esteemed international journal that focuses on publishing original and creative research findings in the field of information sciences. We also feature a limited number of timely tutorial and surveying contributions. Our journal aims to cater to a diverse audience, including researchers, developers, managers, strategic planners, graduate students, and anyone interested in staying up-to-date with cutting-edge research in information science, knowledge engineering, and intelligent systems. While readers are expected to share a common interest in information science, they come from varying backgrounds such as engineering, mathematics, statistics, physics, computer science, cell biology, molecular biology, management science, cognitive science, neurobiology, behavioral sciences, and biochemistry.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信