Comparative analysis of genetic based approach and Apriori algorithm for mining maximal frequent item sets

2015 IEEE Congress on Evolutionary Computation (CEC) Pub Date : 2015-05-25 DOI:10.1109/CEC.2015.7256872

Mir Md. Jahangir Kabir, Shuxiang Xu, B. Kang, Zongyuan Zhao

{"title":"Comparative analysis of genetic based approach and Apriori algorithm for mining maximal frequent item sets","authors":"Mir Md. Jahangir Kabir, Shuxiang Xu, B. Kang, Zongyuan Zhao","doi":"10.1109/CEC.2015.7256872","DOIUrl":null,"url":null,"abstract":"In the data mining research area, discovering frequent item sets is an important issue and key factor for mining association rules. For large datasets, a huge amount of frequent patterns are generated for a low support value, which is a major challenge in frequent pattern mining tasks. A Maximal frequent pattern mining task helps to resolve this problem since a maximal frequent pattern contains information about a large number of small frequent sub patterns. For this study we have developed a genetic based approach to find maximal frequent patterns using a user defined threshold value as a constraint. To optimize the search problems, a genetic algorithm is one of the best choices which mimics the natural selection procedure and considers global search mechanism which is good for searching solution especially when the search space is large. The use of evolutionary algorithm is also effective for undetermined solutions. Therefore, this approach uses a genetic algorithm to find maximal frequent item sets from different sorts of data sets. A low support value generates some large patterns which contain the information about huge amount of small frequent sub patterns that could be useful for mining association rules. We have applied this genetic based approach for different real data sets as well as synthetic data sets. The experimental results show that our proposed approach evaluates less nodes than the number of candidate item sets considered by Apriori algorithm, especially when the support value is set low.","PeriodicalId":403666,"journal":{"name":"2015 IEEE Congress on Evolutionary Computation (CEC)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-05-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 IEEE Congress on Evolutionary Computation (CEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CEC.2015.7256872","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 13

Abstract

In the data mining research area, discovering frequent item sets is an important issue and key factor for mining association rules. For large datasets, a huge amount of frequent patterns are generated for a low support value, which is a major challenge in frequent pattern mining tasks. A Maximal frequent pattern mining task helps to resolve this problem since a maximal frequent pattern contains information about a large number of small frequent sub patterns. For this study we have developed a genetic based approach to find maximal frequent patterns using a user defined threshold value as a constraint. To optimize the search problems, a genetic algorithm is one of the best choices which mimics the natural selection procedure and considers global search mechanism which is good for searching solution especially when the search space is large. The use of evolutionary algorithm is also effective for undetermined solutions. Therefore, this approach uses a genetic algorithm to find maximal frequent item sets from different sorts of data sets. A low support value generates some large patterns which contain the information about huge amount of small frequent sub patterns that could be useful for mining association rules. We have applied this genetic based approach for different real data sets as well as synthetic data sets. The experimental results show that our proposed approach evaluates less nodes than the number of candidate item sets considered by Apriori algorithm, especially when the support value is set low.

查看原文本刊更多论文

基于遗传的最大频繁项集挖掘方法与Apriori算法的比较分析

在数据挖掘研究领域，频繁项集的发现是关联规则挖掘的一个重要问题和关键因素。对于大型数据集，在低支持值的情况下产生了大量的频繁模式，这是频繁模式挖掘任务的主要挑战。最大频繁模式挖掘任务有助于解决这个问题，因为最大频繁模式包含大量小频繁子模式的信息。对于这项研究，我们开发了一种基于遗传的方法，使用用户定义的阈值作为约束来寻找最大频率模式。遗传算法是优化搜索问题的最佳选择之一，它既模仿自然选择过程，又考虑全局搜索机制，尤其在搜索空间较大的情况下，更有利于搜索解。进化算法的使用对于待定解也是有效的。因此，该方法使用遗传算法从不同类型的数据集中找到最大频繁项集。低支持值会生成一些大型模式，其中包含大量小的频繁子模式的信息，这些信息可能对挖掘关联规则有用。我们已经将这种基于遗传的方法应用于不同的真实数据集以及合成数据集。实验结果表明，与Apriori算法相比，该方法评估的节点数量更少，特别是当支持值设置较低时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2015 IEEE Congress on Evolutionary Computation (CEC)

自引率

0.00%

发文量