{"title":"ApproxMGMSP: A Scalable Method of Mining Approximate Multidimensional Sequential Patterns on Distributed System","authors":"Changhai Zhang, Kong-fa Hu, Zhuxi Chen, Ling Chen, Yisheng Dong","doi":"10.1109/FSKD.2007.192","DOIUrl":null,"url":null,"abstract":"We present a scalable and effective algorithm called ApproxMGMSP (Approximate Mining of Global Multidimensional Sequential Patterns) to solve the problem of mining the multidimensional sequential patterns for large databases in the distributed environment. Our method differs from previous related works of mining multidimensional patterns on distributed system. The main difference is that an approximate mining method is used in large multidimensional sequence database firstly. In this paper, to convert the mining on the multidimensional sequential patterns to sequential patterns, the multidimensional information is embedded into the corresponding sequences. Then the sequences are clustered, summarized, and analyzed on the distributed sites, and the local patterns could be obtained by the effective approximate sequential pattern mining method. Finally, the global multidimensional sequential patterns could be quickly mined by high vote sequential pattern model after collecting all the local patterns on one site. Both the theories and the experiments indicate that this method could simplify the problem of mining the multidimensional sequential patterns and avoid mining the redundant information. The global sequential patterns could be obtained effectively by the scalable method after reducing the cost of communication.","PeriodicalId":201883,"journal":{"name":"Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2007-08-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"15","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Fourth International Conference on Fuzzy Systems and Knowledge Discovery (FSKD 2007)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/FSKD.2007.192","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 15
Abstract
We present a scalable and effective algorithm called ApproxMGMSP (Approximate Mining of Global Multidimensional Sequential Patterns) to solve the problem of mining the multidimensional sequential patterns for large databases in the distributed environment. Our method differs from previous related works of mining multidimensional patterns on distributed system. The main difference is that an approximate mining method is used in large multidimensional sequence database firstly. In this paper, to convert the mining on the multidimensional sequential patterns to sequential patterns, the multidimensional information is embedded into the corresponding sequences. Then the sequences are clustered, summarized, and analyzed on the distributed sites, and the local patterns could be obtained by the effective approximate sequential pattern mining method. Finally, the global multidimensional sequential patterns could be quickly mined by high vote sequential pattern model after collecting all the local patterns on one site. Both the theories and the experiments indicate that this method could simplify the problem of mining the multidimensional sequential patterns and avoid mining the redundant information. The global sequential patterns could be obtained effectively by the scalable method after reducing the cost of communication.
为了解决分布式环境下大型数据库的多维序列模式挖掘问题,提出了一种可扩展且有效的算法ApproxMGMSP (Approximate Mining of Global Multidimensional Sequential Patterns)。我们的方法不同于以往在分布式系统上挖掘多维模式的相关工作。主要区别在于首先在大型多维序列数据库中使用近似挖掘方法。为了将对多维序列模式的挖掘转化为序列模式,本文将多维信息嵌入到相应的序列中。然后在分布站点上对序列进行聚类、汇总和分析,利用有效的近似序列模式挖掘方法获得局部模式。最后,在收集一个站点的所有局部模式后,采用高投票序列模式模型快速挖掘全局多维序列模式。理论和实验都表明,该方法可以简化多维序列模式的挖掘问题,避免冗余信息的挖掘。该方法在降低通信成本的前提下,能够有效地获取全局序列模式。