Motaher Hossain, Youxi Wu, Philippe Fournier-Viger, Zhao Li, Lei Guo, Yan Li
{"title":"HSNP-Miner:高实用自适应无重叠模式挖掘","authors":"Motaher Hossain, Youxi Wu, Philippe Fournier-Viger, Zhao Li, Lei Guo, Yan Li","doi":"10.1109/ICKG52313.2021.00019","DOIUrl":null,"url":null,"abstract":"Sequential pattern mining (SPM) under the nonoverlapping condition (or nonoverlapping SPM) is a type of data mining used to extract frequent gapped subsequences (known as patterns) from sequences, which is more valuable and versatile than other related methods. In nonoverlapping SPM, two occurrences cannot reuse the same sequence letter in the exact location as the occurrences. This method evaluates the frequency of the patterns in the sequence, and ignores the impact of external utility (item price or profit). Therefore, some low-frequency and essential patterns are overlooked. To address this issue, this paper introduces High Utility Self-adaptive Nonoverlapping Pattern (HSNP) mining and proposes HSNP-Miner, which includes two steps: support calculation and candi-date pattern generation. To calculate the support, we propose the NoSup algorithm, which can effectively calculate support while avoiding the creation of redundant nodes. An advanced upper bound method is employed to generate the candidate patterns more efficiently. Compared to other competitive methods, the experimental results demonstrate the efficiency of the proposed algorithm and the uniqueness of nonoverlapping sequence pat-tarns.","PeriodicalId":174126,"journal":{"name":"2021 IEEE International Conference on Big Knowledge (ICBK)","volume":"31 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"HSNP-Miner: High Utility Self-Adaptive Nonoverlapping Pattern Mining\",\"authors\":\"Motaher Hossain, Youxi Wu, Philippe Fournier-Viger, Zhao Li, Lei Guo, Yan Li\",\"doi\":\"10.1109/ICKG52313.2021.00019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Sequential pattern mining (SPM) under the nonoverlapping condition (or nonoverlapping SPM) is a type of data mining used to extract frequent gapped subsequences (known as patterns) from sequences, which is more valuable and versatile than other related methods. In nonoverlapping SPM, two occurrences cannot reuse the same sequence letter in the exact location as the occurrences. This method evaluates the frequency of the patterns in the sequence, and ignores the impact of external utility (item price or profit). Therefore, some low-frequency and essential patterns are overlooked. To address this issue, this paper introduces High Utility Self-adaptive Nonoverlapping Pattern (HSNP) mining and proposes HSNP-Miner, which includes two steps: support calculation and candi-date pattern generation. To calculate the support, we propose the NoSup algorithm, which can effectively calculate support while avoiding the creation of redundant nodes. An advanced upper bound method is employed to generate the candidate patterns more efficiently. Compared to other competitive methods, the experimental results demonstrate the efficiency of the proposed algorithm and the uniqueness of nonoverlapping sequence pat-tarns.\",\"PeriodicalId\":174126,\"journal\":{\"name\":\"2021 IEEE International Conference on Big Knowledge (ICBK)\",\"volume\":\"31 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 IEEE International Conference on Big Knowledge (ICBK)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICKG52313.2021.00019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 IEEE International Conference on Big Knowledge (ICBK)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICKG52313.2021.00019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
摘要
非重叠条件下的顺序模式挖掘(SPM)是一种用于从序列中提取频繁间隙子序列(称为模式)的数据挖掘方法,它比其他相关方法更有价值和通用性。在非重叠SPM中,两个序列不能在相同的位置重复使用相同的序列字母。该方法评估序列中模式的频率,并忽略外部效用(项目价格或利润)的影响。因此,忽略了一些低频和基本模式。为了解决这一问题,本文引入了HSNP (High Utility Self-adaptive non - overlap Pattern)挖掘方法,并提出了HSNP- miner算法,该算法包括支持度计算和候选数据模式生成两个步骤。为了计算支持度,我们提出了NoSup算法,该算法可以有效地计算支持度,同时避免冗余节点的产生。采用一种先进的上界方法,更有效地生成候选模式。与其他竞争方法相比,实验结果证明了该算法的有效性和非重叠序列模式的唯一性。
HSNP-Miner: High Utility Self-Adaptive Nonoverlapping Pattern Mining
Sequential pattern mining (SPM) under the nonoverlapping condition (or nonoverlapping SPM) is a type of data mining used to extract frequent gapped subsequences (known as patterns) from sequences, which is more valuable and versatile than other related methods. In nonoverlapping SPM, two occurrences cannot reuse the same sequence letter in the exact location as the occurrences. This method evaluates the frequency of the patterns in the sequence, and ignores the impact of external utility (item price or profit). Therefore, some low-frequency and essential patterns are overlooked. To address this issue, this paper introduces High Utility Self-adaptive Nonoverlapping Pattern (HSNP) mining and proposes HSNP-Miner, which includes two steps: support calculation and candi-date pattern generation. To calculate the support, we propose the NoSup algorithm, which can effectively calculate support while avoiding the creation of redundant nodes. An advanced upper bound method is employed to generate the candidate patterns more efficiently. Compared to other competitive methods, the experimental results demonstrate the efficiency of the proposed algorithm and the uniqueness of nonoverlapping sequence pat-tarns.