{"title":"An expanded prefix tree-based mining algorithm for sequential pattern maintenance with deletions","authors":"Hoang Thi Hong Van, Vo Thi Ngoc Chau, N. H. Phung","doi":"10.1109/ICITISEE.2017.8285476","DOIUrl":null,"url":null,"abstract":"Sequential pattern mining is an important mining task for discovering sequential patterns along with their insight relationships in many real-world applications. In practice, sequence databases are kept changing over the time along with their business. For some reasons, some sequences in the database are asked to be deleted from the database. In order to have a synchronization of discovered sequential patterns with the database from which they have been discovered, the sequential pattern mining task is re-considered with many challenges. As the number of deleted sequences is often smaller than the size of the entire database, re-mining from scratch the updated database might incur a high cost because sequential pattern mining is a computationally expensive task. In this paper, our work aims at an efficient incremental mining solution to the sequential pattern mining task with sequence deletions. Different from the existing works, we propose an expanded prefix tree by extending the existing prefix tree with additional structures for capturing more necessary information for the incremental mining process. Based on this tree, we propose an incremental sequential pattern mining algorithm, SPMD, for finding a complete set of sequential patterns with no re-scanning the original database, when a number of sequences in the database are deleted. Experimental results on the benchmark databases have confirmed that our SPMD algorithm outperforms the re-mining from scratch with the PrefixSpan algorithm with less running time.","PeriodicalId":130873,"journal":{"name":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","volume":"127 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 2nd International conferences on Information Technology, Information Systems and Electrical Engineering (ICITISEE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICITISEE.2017.8285476","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Sequential pattern mining is an important mining task for discovering sequential patterns along with their insight relationships in many real-world applications. In practice, sequence databases are kept changing over the time along with their business. For some reasons, some sequences in the database are asked to be deleted from the database. In order to have a synchronization of discovered sequential patterns with the database from which they have been discovered, the sequential pattern mining task is re-considered with many challenges. As the number of deleted sequences is often smaller than the size of the entire database, re-mining from scratch the updated database might incur a high cost because sequential pattern mining is a computationally expensive task. In this paper, our work aims at an efficient incremental mining solution to the sequential pattern mining task with sequence deletions. Different from the existing works, we propose an expanded prefix tree by extending the existing prefix tree with additional structures for capturing more necessary information for the incremental mining process. Based on this tree, we propose an incremental sequential pattern mining algorithm, SPMD, for finding a complete set of sequential patterns with no re-scanning the original database, when a number of sequences in the database are deleted. Experimental results on the benchmark databases have confirmed that our SPMD algorithm outperforms the re-mining from scratch with the PrefixSpan algorithm with less running time.