{"title":"An Improved Sequential Pattern Algorithm Based on Data Mining","authors":"Jin Zhao, Runtao Lv, Yu Li","doi":"10.14257/IJDTA.2017.10.1.03","DOIUrl":null,"url":null,"abstract":"This paper mentions several interestingness measures as Lift, Conviction, Piatetsky-Shapiro, Cosine, Jaccard and so on, which have proposed for mining association rules and classification rules but they have not been applied to mine sequential rules in sequence databases except the traditional measures of rule such as the support and confidence. We also propose then an efficient algorithm to generate all relevant sequential rules with the above interestingness measures from the prefix-tree which stored the whole sequential pattern where each child node stores a sequential pattern and its corresponding support value. By traversing the prefix-tree, the algorithm can then easily identify the components of a rule, and can calculate the measured values of the rule. The experimental results show that sequential rule mining with interestingness measures using the proposed algorithm based on the prefix-tree was always much faster than that using the other existing algorithm as modified Full. Especially when mining in large sequence databases with the low minimum support values, the number of sequential patterns generated from sequence databases was large and the proposed algorithm outperformed much because the proposed algorithm only traverse the prefix-tree to immediately determine which sequences are the left- and right-hand sides of a rule as well as their support values to compute the interestingness measure values of the rule from the sequential pattern set. In addition, the experimental results also show that the time for mining sequential rules with the confidence measure was the smallest, because it did not need to revisit the prefix-tree to determine the support of Y (the antecedence of rules), while the other interestingness measures need to revisit the prefix-tree to determine the support values of the consequent of rules or both the antecedence and the consequent.","PeriodicalId":13926,"journal":{"name":"International journal of database theory and application","volume":"18 1","pages":"23-36"},"PeriodicalIF":0.0000,"publicationDate":"2017-01-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International journal of database theory and application","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.14257/IJDTA.2017.10.1.03","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
This paper mentions several interestingness measures as Lift, Conviction, Piatetsky-Shapiro, Cosine, Jaccard and so on, which have proposed for mining association rules and classification rules but they have not been applied to mine sequential rules in sequence databases except the traditional measures of rule such as the support and confidence. We also propose then an efficient algorithm to generate all relevant sequential rules with the above interestingness measures from the prefix-tree which stored the whole sequential pattern where each child node stores a sequential pattern and its corresponding support value. By traversing the prefix-tree, the algorithm can then easily identify the components of a rule, and can calculate the measured values of the rule. The experimental results show that sequential rule mining with interestingness measures using the proposed algorithm based on the prefix-tree was always much faster than that using the other existing algorithm as modified Full. Especially when mining in large sequence databases with the low minimum support values, the number of sequential patterns generated from sequence databases was large and the proposed algorithm outperformed much because the proposed algorithm only traverse the prefix-tree to immediately determine which sequences are the left- and right-hand sides of a rule as well as their support values to compute the interestingness measure values of the rule from the sequential pattern set. In addition, the experimental results also show that the time for mining sequential rules with the confidence measure was the smallest, because it did not need to revisit the prefix-tree to determine the support of Y (the antecedence of rules), while the other interestingness measures need to revisit the prefix-tree to determine the support values of the consequent of rules or both the antecedence and the consequent.