增量数据库中高效用序列模式的增量挖掘

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management Pub Date : 2016-10-24 DOI:10.1145/2983323.2983691

Jun-Zhe Wang, Jiun-Long Huang

{"title":"增量数据库中高效用序列模式的增量挖掘","authors":"Jun-Zhe Wang, Jiun-Long Huang","doi":"10.1145/2983323.2983691","DOIUrl":null,"url":null,"abstract":"High utility sequential pattern (HUSP) mining is an emerging topic in pattern mining, and only a few algorithms have been proposed to address it. In practice, most sequence databases usually grow over time, and it is inefficient for existing algorithms to mine HUSPs from scratch when databases grow with a small portion of updates. In view of this, we propose the IncUSP-Miner algorithm to mine HUSPs incrementally. Specifically, to avoid redundant computations, we propose a tighter upper bound of the utility of a sequence, called TSU, and then design a novel data structure, called the candidate pattern tree, to maintain the sequences whose TSU values are greater than or equal to the minimum utility threshold. Accordingly, to avoid keeping a huge amount of utility information for each sequence, a set of auxiliary utility information is designed to be stored in each tree node. Moreover, for those nodes whose utilities have to be updated, a strategy is also proposed to reduce the amount of computation, thereby improving the mining efficiency. Experimental results on three real datasets show that IncUSP-Miner is able to efficiently mine HUSPs incrementally.","PeriodicalId":250808,"journal":{"name":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","volume":"48 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-10-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":"{\"title\":\"Incremental Mining of High Utility Sequential Patterns in Incremental Databases\",\"authors\":\"Jun-Zhe Wang, Jiun-Long Huang\",\"doi\":\"10.1145/2983323.2983691\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"High utility sequential pattern (HUSP) mining is an emerging topic in pattern mining, and only a few algorithms have been proposed to address it. In practice, most sequence databases usually grow over time, and it is inefficient for existing algorithms to mine HUSPs from scratch when databases grow with a small portion of updates. In view of this, we propose the IncUSP-Miner algorithm to mine HUSPs incrementally. Specifically, to avoid redundant computations, we propose a tighter upper bound of the utility of a sequence, called TSU, and then design a novel data structure, called the candidate pattern tree, to maintain the sequences whose TSU values are greater than or equal to the minimum utility threshold. Accordingly, to avoid keeping a huge amount of utility information for each sequence, a set of auxiliary utility information is designed to be stored in each tree node. Moreover, for those nodes whose utilities have to be updated, a strategy is also proposed to reduce the amount of computation, thereby improving the mining efficiency. Experimental results on three real datasets show that IncUSP-Miner is able to efficiently mine HUSPs incrementally.\",\"PeriodicalId\":250808,\"journal\":{\"name\":\"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management\",\"volume\":\"48 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-10-24\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"14\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/2983323.2983691\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 25th ACM International on Conference on Information and Knowledge Management","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/2983323.2983691","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 14

摘要

高效用序列模式(High utility sequential pattern, HUSP)挖掘是模式挖掘中的一个新兴课题，目前针对该问题提出的算法很少。在实践中，大多数序列数据库通常会随着时间的推移而增长，当数据库随着一小部分更新而增长时，现有算法从头开始挖掘husp的效率很低。鉴于此，我们提出了IncUSP-Miner算法来增量挖掘husp。具体来说，为了避免冗余计算，我们提出了一个更严格的序列效用上界，称为TSU，然后设计了一种新的数据结构，称为候选模式树，以维护TSU值大于或等于最小效用阈值的序列。因此，为了避免为每个序列保留大量的实用信息，设计了一组辅助实用信息存储在每个树节点中。此外，对于那些需要更新效用的节点，还提出了一种减少计算量的策略，从而提高挖掘效率。在三个真实数据集上的实验结果表明，IncUSP-Miner能够有效地增量挖掘husp。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Incremental Mining of High Utility Sequential Patterns in Incremental Databases

High utility sequential pattern (HUSP) mining is an emerging topic in pattern mining, and only a few algorithms have been proposed to address it. In practice, most sequence databases usually grow over time, and it is inefficient for existing algorithms to mine HUSPs from scratch when databases grow with a small portion of updates. In view of this, we propose the IncUSP-Miner algorithm to mine HUSPs incrementally. Specifically, to avoid redundant computations, we propose a tighter upper bound of the utility of a sequence, called TSU, and then design a novel data structure, called the candidate pattern tree, to maintain the sequences whose TSU values are greater than or equal to the minimum utility threshold. Accordingly, to avoid keeping a huge amount of utility information for each sequence, a set of auxiliary utility information is designed to be stored in each tree node. Moreover, for those nodes whose utilities have to be updated, a strategy is also proposed to reduce the amount of computation, thereby improving the mining efficiency. Experimental results on three real datasets show that IncUSP-Miner is able to efficiently mine HUSPs incrementally.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 25th ACM International on Conference on Information and Knowledge Management

自引率

0.00%

发文量