Toward a Distinguishing Approach for Improving the Apriori Algorithm

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE) Pub Date : 2019-10-01 DOI:10.1109/ICCKE48569.2019.8965206

Mahdieh Dehghani, A. Kamandi, M. Shabankhah, A. Moeini

{"title":"Toward a Distinguishing Approach for Improving the Apriori Algorithm","authors":"Mahdieh Dehghani, A. Kamandi, M. Shabankhah, A. Moeini","doi":"10.1109/ICCKE48569.2019.8965206","DOIUrl":null,"url":null,"abstract":"Association rule mining, one of the most important branches of data mining, which focused on detecting frequent patterns of itemsets. Apriori is the first algorithm proposed for association rule mining. This algorithm has the best response and can detect all frequent itemsets from transaction databases. Apriori is of time complexity order two to the power n at worst case, n is the number of items in the database. At each step, the database is scanned to detect frequent itemsets. As a result, this algorithm has a very large response time for large databases. There are two ways to reduce the response time of this algorithm. First, prune the itemsets which candidate for checking. Second, reduce the dimension of the database. We used the second solution and reduce the dimension of the database considering that if a set is frequent, all of its subsets are frequent with more frequencies in the database. In the proposed algorithm, database scanned one time, and then frequent itemsets are detected by the reduced database. Our algorithm improved an apriori response time. To evaluate the algorithm, precision and recall measures have been used. According to the experimental in most cases, the algorithm can provide precision and recall above ninety percent.","PeriodicalId":6685,"journal":{"name":"2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)","volume":"48 1","pages":"309-314"},"PeriodicalIF":0.0000,"publicationDate":"2019-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCKE48569.2019.8965206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Association rule mining, one of the most important branches of data mining, which focused on detecting frequent patterns of itemsets. Apriori is the first algorithm proposed for association rule mining. This algorithm has the best response and can detect all frequent itemsets from transaction databases. Apriori is of time complexity order two to the power n at worst case, n is the number of items in the database. At each step, the database is scanned to detect frequent itemsets. As a result, this algorithm has a very large response time for large databases. There are two ways to reduce the response time of this algorithm. First, prune the itemsets which candidate for checking. Second, reduce the dimension of the database. We used the second solution and reduce the dimension of the database considering that if a set is frequent, all of its subsets are frequent with more frequencies in the database. In the proposed algorithm, database scanned one time, and then frequent itemsets are detected by the reduced database. Our algorithm improved an apriori response time. To evaluate the algorithm, precision and recall measures have been used. According to the experimental in most cases, the algorithm can provide precision and recall above ninety percent.

查看原文本刊更多论文

一种改进Apriori算法的判别方法

关联规则挖掘是数据挖掘的一个重要分支，其重点是检测项目集的频繁模式。Apriori是最早提出的关联规则挖掘算法。该算法具有最佳的响应性，能够检测到事务数据库中所有的频繁项集。Apriori的时间复杂度为(2 ^ n)在最坏的情况下，n是数据库中项目的数量。在每一步中，都会扫描数据库以检测频繁的项集。因此，对于大型数据库，该算法的响应时间非常长。有两种方法可以减少该算法的响应时间。首先，删减要检查的候选项集。其次，降低数据库的维数。我们使用第二种解决方案，考虑到如果一个集合是频繁的，那么它的所有子集都是频繁的，并且在数据库中频率更高，因此降低了数据库的维数。该算法首先对数据库进行一次扫描，然后通过简化后的数据库检测出频繁项集。我们的算法改进了先验响应时间。为了评估该算法，使用了精度和召回率度量。实验表明，在大多数情况下，该算法的查准率和查全率都在90%以上。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2019 9th International Conference on Computer and Knowledge Engineering (ICCKE)

自引率

0.00%

发文量