{"title":"Mining sequential patterns using graph search techniques","authors":"Yin-Fu Huang, Shao-Yuan Lin","doi":"10.1109/CMPSAC.2003.1245314","DOIUrl":null,"url":null,"abstract":"Sequential patterns discovery had emerged as an important problem in data mining. In this paper, we propose an effective GST algorithm for mining sequential patterns in a large transaction database. Different from the apriori-like algorithms, the GST algorithm can out of order find large k-sequences (k >= 3);i.e., we can find large k-sequences not directly through large (k-1)-sequences. This leads to that our algorithm has much better performance than the Apriori-like algorithms. Besides, we also propose the method to find new sequential patterns by scanning only new transactions since the database was increased. Through several comprehensive experiments, the GST algorithm gains a significant performance improvement over the Apriori-like algorithms. Also we found as long as the ratio of the items purchased in new transactions is always much better than scanning the entire database.","PeriodicalId":173397,"journal":{"name":"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003","volume":"70 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2003-11-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"51","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings 27th Annual International Computer Software and Applications Conference. COMPAC 2003","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CMPSAC.2003.1245314","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 51
Abstract
Sequential patterns discovery had emerged as an important problem in data mining. In this paper, we propose an effective GST algorithm for mining sequential patterns in a large transaction database. Different from the apriori-like algorithms, the GST algorithm can out of order find large k-sequences (k >= 3);i.e., we can find large k-sequences not directly through large (k-1)-sequences. This leads to that our algorithm has much better performance than the Apriori-like algorithms. Besides, we also propose the method to find new sequential patterns by scanning only new transactions since the database was increased. Through several comprehensive experiments, the GST algorithm gains a significant performance improvement over the Apriori-like algorithms. Also we found as long as the ratio of the items purchased in new transactions is always much better than scanning the entire database.