An Improved Sequential Pattern Algorithm Based on Data Mining

International journal of database theory and application Pub Date : 2017-01-31 DOI:10.14257/IJDTA.2017.10.1.03

Jin Zhao, Runtao Lv, Yu Li

{"title":"An Improved Sequential Pattern Algorithm Based on Data Mining","authors":"Jin Zhao, Runtao Lv, Yu Li","doi":"10.14257/IJDTA.2017.10.1.03","DOIUrl":null,"url":null,"abstract":"This paper mentions several interestingness measures as Lift, Conviction, Piatetsky-Shapiro, Cosine, Jaccard and so on, which have proposed for mining association rules and classification rules but they have not been applied to mine sequential rules in sequence databases except the traditional measures of rule such as the support and confidence. We also propose then an efficient algorithm to generate all relevant sequential rules with the above interestingness measures from the prefix-tree which stored the whole sequential pattern where each child node stores a sequential pattern and its corresponding support value. By traversing the prefix-tree, the algorithm can then easily identify the components of a rule, and can calculate the measured values of the rule. The experimental results show that sequential rule mining with interestingness measures using the proposed algorithm based on the prefix-tree was always much faster than that using the other existing algorithm as modified Full. Especially when mining in large sequence databases with the low minimum support values, the number of sequential patterns generated from sequence databases was large and the proposed algorithm outperformed much because the proposed algorithm only traverse the prefix-tree to immediately determine which sequences are the left- and right-hand sides of a rule as well as their support values to compute the interestingness measure values of the rule from the sequential pattern set. In addition, the experimental results also show that the time for mining sequential rules with the confidence measure was the smallest, because it did not need to revisit the prefix-tree to determine the support of Y (the antecedence of rules), while the other interestingness measures need to revisit the prefix-tree to determine the support values of the consequent of rules or both the antecedence and the consequent.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"18 1","pages":"23-36"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of database theory and application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJDTA.2017.10.1.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

This paper mentions several interestingness measures as Lift, Conviction, Piatetsky-Shapiro, Cosine, Jaccard and so on, which have proposed for mining association rules and classification rules but they have not been applied to mine sequential rules in sequence databases except the traditional measures of rule such as the support and confidence. We also propose then an efficient algorithm to generate all relevant sequential rules with the above interestingness measures from the prefix-tree which stored the whole sequential pattern where each child node stores a sequential pattern and its corresponding support value. By traversing the prefix-tree, the algorithm can then easily identify the components of a rule, and can calculate the measured values of the rule. The experimental results show that sequential rule mining with interestingness measures using the proposed algorithm based on the prefix-tree was always much faster than that using the other existing algorithm as modified Full. Especially when mining in large sequence databases with the low minimum support values, the number of sequential patterns generated from sequence databases was large and the proposed algorithm outperformed much because the proposed algorithm only traverse the prefix-tree to immediately determine which sequences are the left- and right-hand sides of a rule as well as their support values to compute the interestingness measure values of the rule from the sequential pattern set. In addition, the experimental results also show that the time for mining sequential rules with the confidence measure was the smallest, because it did not need to revisit the prefix-tree to determine the support of Y (the antecedence of rules), while the other interestingness measures need to revisit the prefix-tree to determine the support values of the consequent of rules or both the antecedence and the consequent.

查看原文本刊更多论文

一种基于数据挖掘的改进序列模式算法

本文提到了Lift、Conviction、Piatetsky-Shapiro、Cosine、Jaccard等几种有趣度度量，这些度量被提出用于挖掘关联规则和分类规则，但除了传统的规则度量如支持度和置信度外，尚未应用于挖掘序列数据库中的序列规则。然后，我们还提出了一种有效的算法，从存储整个序列模式的前缀树中生成所有相关的序列规则，其中每个子节点存储一个序列模式及其相应的支持值。通过遍历前缀树，该算法可以很容易地识别规则的组成部分，并计算出规则的测量值。实验结果表明，基于前缀树的兴趣度度量序列规则挖掘算法的挖掘速度总是比使用其他改进的Full算法快得多。特别是当在最小支持值较低的大型序列数据库中挖掘时，从序列数据库中生成的序列模式数量较多，该算法仅通过遍历前缀树即可立即确定规则的左右两侧序列及其支持值，从而从序列模式集中计算规则的兴趣度度量值，因此性能优于传统算法。此外，实验结果还表明，使用置信度度量挖掘顺序规则所需的时间最小，因为它不需要重新访问前缀树来确定Y的支持度(规则的先行性)，而其他兴趣度度量则需要重新访问前缀树来确定规则的后结果或前结果和后结果的支持值。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International journal of database theory and application

自引率

0.00%

发文量