通过比较推荐系统的Fp-Growth、Apriori和TPQ-Apriori算法，确定最佳的基于规则的分析结果

Matrix Jurnal Manajemen Teknologi dan Informatika Pub Date : 2023-07-29 DOI:10.31940/matrix.v13i2.52-67

Moch. Syahrir, Lalu Zazuli Azhar Mardedi

{"title":"通过比较推荐系统的Fp-Growth、Apriori和TPQ-Apriori算法，确定最佳的基于规则的分析结果","authors":"Moch. Syahrir, Lalu Zazuli Azhar Mardedi","doi":"10.31940/matrix.v13i2.52-67","DOIUrl":null,"url":null,"abstract":"The popular association rule algorithms are Apriori and fp-growth; both of these algorithms are very familiar among data mining researchers; however, there are some weaknesses found in the association rule algorithm, including long dataset scans in the process of finding the frequency of the item set, using large memory, and the resulting rules being sometimes less than optimal. In this study, the authors made a comparison of the fp-growth, Apriori, and TPQ-Apriori algorithms to analyze the rule results of the three algorithms. TPQ- Apriori is an algorithm developed from the Apriori algorithm. For experiments, the Apriori and fp-growth algorithms use RapidMiner and Weka tools, while the TPQ-apriori algorithm uses self-built application programs. The dataset used is the sales data for the Kopegtel NTB department store, which has been uploaded on the Kaggle site. As for the results of testing the base rules from the overall results of testing the rules with the good Kopegtel dataset for 100%, 50%, and 25% of the total volume of the dataset, a conclusion can be drawn that the larger the dataset to be processed, the results will be more optimal when using the fp-growth algorithm RapidMiner, but not optimal if the dataset to be processed is small. It is different from using the Apriori and Weka FP-growth algorithms, where the resulting rules are less than optimal if the dataset used is large and optimal if the dataset is small. Several rules do not appear in the fp-growth and Apriori Weka algorithms because the two algorithms do not have a tolerance value in Weka's tools for the support of the rules that will be displayed. Meanwhile, the TPQ- Apriori algorithm that has been developed is capable of producing optimal rules for both large datasets and small datasets.","PeriodicalId":31964,"journal":{"name":"Matrix Jurnal Manajemen Teknologi dan Informatika","volume":"114 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Determination of the best rule-based analysis results from the comparison of the Fp-Growth, Apriori, and TPQ-Apriori Algorithms for recommendation systems\",\"authors\":\"Moch. Syahrir, Lalu Zazuli Azhar Mardedi\",\"doi\":\"10.31940/matrix.v13i2.52-67\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The popular association rule algorithms are Apriori and fp-growth; both of these algorithms are very familiar among data mining researchers; however, there are some weaknesses found in the association rule algorithm, including long dataset scans in the process of finding the frequency of the item set, using large memory, and the resulting rules being sometimes less than optimal. In this study, the authors made a comparison of the fp-growth, Apriori, and TPQ-Apriori algorithms to analyze the rule results of the three algorithms. TPQ- Apriori is an algorithm developed from the Apriori algorithm. For experiments, the Apriori and fp-growth algorithms use RapidMiner and Weka tools, while the TPQ-apriori algorithm uses self-built application programs. The dataset used is the sales data for the Kopegtel NTB department store, which has been uploaded on the Kaggle site. As for the results of testing the base rules from the overall results of testing the rules with the good Kopegtel dataset for 100%, 50%, and 25% of the total volume of the dataset, a conclusion can be drawn that the larger the dataset to be processed, the results will be more optimal when using the fp-growth algorithm RapidMiner, but not optimal if the dataset to be processed is small. It is different from using the Apriori and Weka FP-growth algorithms, where the resulting rules are less than optimal if the dataset used is large and optimal if the dataset is small. Several rules do not appear in the fp-growth and Apriori Weka algorithms because the two algorithms do not have a tolerance value in Weka's tools for the support of the rules that will be displayed. Meanwhile, the TPQ- Apriori algorithm that has been developed is capable of producing optimal rules for both large datasets and small datasets.\",\"PeriodicalId\":31964,\"journal\":{\"name\":\"Matrix Jurnal Manajemen Teknologi dan Informatika\",\"volume\":\"114 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Matrix Jurnal Manajemen Teknologi dan Informatika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31940/matrix.v13i2.52-67\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Matrix Jurnal Manajemen Teknologi dan Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31940/matrix.v13i2.52-67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

目前流行的关联规则算法有Apriori和fp-growth;这两种算法在数据挖掘研究人员中都非常熟悉;然而，关联规则算法存在一些弱点，包括在查找项目集频率的过程中需要长时间的数据集扫描，使用大量内存，以及生成的规则有时不是最优的。在本研究中，作者对fp-growth、Apriori和TPQ-Apriori算法进行了比较，分析了三种算法的规则结果。TPQ- Apriori是在Apriori算法基础上发展而来的一种算法。实验中，Apriori和fp-growth算法使用RapidMiner和Weka工具，TPQ-apriori算法使用自建应用程序。使用的数据集是Kopegtel NTB百货商店的销售数据，该数据已上传到Kaggle网站上。对于基本规则的测试结果，从使用良好的Kopegtel数据集对数据集总量的100%、50%和25%的规则进行测试的总体结果可以看出，使用fp-growth算法RapidMiner处理的数据集越大，结果越优，而处理的数据集越小，结果并不最优。它与使用Apriori和Weka FP-growth算法不同，在这些算法中，如果使用的数据集很大，生成的规则就不是最优的，如果使用的数据集很小，生成的规则就不是最优的。有几个规则没有出现在fp-growth和Apriori Weka算法中，因为这两种算法在Weka的工具中没有支持将要显示的规则的容差值。同时，所开发的TPQ- Apriori算法能够对大数据集和小数据集产生最优规则。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Determination of the best rule-based analysis results from the comparison of the Fp-Growth, Apriori, and TPQ-Apriori Algorithms for recommendation systems

The popular association rule algorithms are Apriori and fp-growth; both of these algorithms are very familiar among data mining researchers; however, there are some weaknesses found in the association rule algorithm, including long dataset scans in the process of finding the frequency of the item set, using large memory, and the resulting rules being sometimes less than optimal. In this study, the authors made a comparison of the fp-growth, Apriori, and TPQ-Apriori algorithms to analyze the rule results of the three algorithms. TPQ- Apriori is an algorithm developed from the Apriori algorithm. For experiments, the Apriori and fp-growth algorithms use RapidMiner and Weka tools, while the TPQ-apriori algorithm uses self-built application programs. The dataset used is the sales data for the Kopegtel NTB department store, which has been uploaded on the Kaggle site. As for the results of testing the base rules from the overall results of testing the rules with the good Kopegtel dataset for 100%, 50%, and 25% of the total volume of the dataset, a conclusion can be drawn that the larger the dataset to be processed, the results will be more optimal when using the fp-growth algorithm RapidMiner, but not optimal if the dataset to be processed is small. It is different from using the Apriori and Weka FP-growth algorithms, where the resulting rules are less than optimal if the dataset used is large and optimal if the dataset is small. Several rules do not appear in the fp-growth and Apriori Weka algorithms because the two algorithms do not have a tolerance value in Weka's tools for the support of the rules that will be displayed. Meanwhile, the TPQ- Apriori algorithm that has been developed is capable of producing optimal rules for both large datasets and small datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Matrix Jurnal Manajemen Teknologi dan Informatika

自引率

0.00%

发文量

审稿时长

24 weeks