通过比较推荐系统的Fp-Growth、Apriori和TPQ-Apriori算法,确定最佳的基于规则的分析结果

Moch. Syahrir, Lalu Zazuli Azhar Mardedi
{"title":"通过比较推荐系统的Fp-Growth、Apriori和TPQ-Apriori算法,确定最佳的基于规则的分析结果","authors":"Moch. Syahrir, Lalu Zazuli Azhar Mardedi","doi":"10.31940/matrix.v13i2.52-67","DOIUrl":null,"url":null,"abstract":"The popular association rule algorithms are Apriori and fp-growth; both of these algorithms are very familiar among data mining researchers; however, there are some weaknesses found in the association rule algorithm, including long dataset scans in the process of finding the frequency of the item set, using large memory, and the resulting rules being sometimes less than optimal. In this study, the authors made a comparison of the fp-growth, Apriori, and TPQ-Apriori algorithms to analyze the rule results of the three algorithms. TPQ- Apriori is an algorithm developed from the Apriori algorithm. For experiments, the Apriori and fp-growth algorithms use RapidMiner and Weka tools, while the TPQ-apriori algorithm uses self-built application programs. The dataset used is the sales data for the Kopegtel NTB department store, which has been uploaded on the Kaggle site. As for the results of testing the base rules from the overall results of testing the rules with the good Kopegtel dataset for 100%, 50%, and 25% of the total volume of the dataset, a conclusion can be drawn that the larger the dataset to be processed, the results will be more optimal when using the fp-growth algorithm RapidMiner, but not optimal if the dataset to be processed is small. It is different from using the Apriori and Weka FP-growth algorithms, where the resulting rules are less than optimal if the dataset used is large and optimal if the dataset is small. Several rules do not appear in the fp-growth and Apriori Weka algorithms because the two algorithms do not have a tolerance value in Weka's tools for the support of the rules that will be displayed. Meanwhile, the TPQ- Apriori algorithm that has been developed is capable of producing optimal rules for both large datasets and small datasets.","PeriodicalId":31964,"journal":{"name":"Matrix Jurnal Manajemen Teknologi dan Informatika","volume":"114 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2023-07-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Determination of the best rule-based analysis results from the comparison of the Fp-Growth, Apriori, and TPQ-Apriori Algorithms for recommendation systems\",\"authors\":\"Moch. Syahrir, Lalu Zazuli Azhar Mardedi\",\"doi\":\"10.31940/matrix.v13i2.52-67\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The popular association rule algorithms are Apriori and fp-growth; both of these algorithms are very familiar among data mining researchers; however, there are some weaknesses found in the association rule algorithm, including long dataset scans in the process of finding the frequency of the item set, using large memory, and the resulting rules being sometimes less than optimal. In this study, the authors made a comparison of the fp-growth, Apriori, and TPQ-Apriori algorithms to analyze the rule results of the three algorithms. TPQ- Apriori is an algorithm developed from the Apriori algorithm. For experiments, the Apriori and fp-growth algorithms use RapidMiner and Weka tools, while the TPQ-apriori algorithm uses self-built application programs. The dataset used is the sales data for the Kopegtel NTB department store, which has been uploaded on the Kaggle site. As for the results of testing the base rules from the overall results of testing the rules with the good Kopegtel dataset for 100%, 50%, and 25% of the total volume of the dataset, a conclusion can be drawn that the larger the dataset to be processed, the results will be more optimal when using the fp-growth algorithm RapidMiner, but not optimal if the dataset to be processed is small. It is different from using the Apriori and Weka FP-growth algorithms, where the resulting rules are less than optimal if the dataset used is large and optimal if the dataset is small. Several rules do not appear in the fp-growth and Apriori Weka algorithms because the two algorithms do not have a tolerance value in Weka's tools for the support of the rules that will be displayed. Meanwhile, the TPQ- Apriori algorithm that has been developed is capable of producing optimal rules for both large datasets and small datasets.\",\"PeriodicalId\":31964,\"journal\":{\"name\":\"Matrix Jurnal Manajemen Teknologi dan Informatika\",\"volume\":\"114 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-07-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Matrix Jurnal Manajemen Teknologi dan Informatika\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.31940/matrix.v13i2.52-67\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Matrix Jurnal Manajemen Teknologi dan Informatika","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.31940/matrix.v13i2.52-67","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

目前流行的关联规则算法有Apriori和fp-growth;这两种算法在数据挖掘研究人员中都非常熟悉;然而,关联规则算法存在一些弱点,包括在查找项目集频率的过程中需要长时间的数据集扫描,使用大量内存,以及生成的规则有时不是最优的。在本研究中,作者对fp-growth、Apriori和TPQ-Apriori算法进行了比较,分析了三种算法的规则结果。TPQ- Apriori是在Apriori算法基础上发展而来的一种算法。实验中,Apriori和fp-growth算法使用RapidMiner和Weka工具,TPQ-apriori算法使用自建应用程序。使用的数据集是Kopegtel NTB百货商店的销售数据,该数据已上传到Kaggle网站上。对于基本规则的测试结果,从使用良好的Kopegtel数据集对数据集总量的100%、50%和25%的规则进行测试的总体结果可以看出,使用fp-growth算法RapidMiner处理的数据集越大,结果越优,而处理的数据集越小,结果并不最优。它与使用Apriori和Weka FP-growth算法不同,在这些算法中,如果使用的数据集很大,生成的规则就不是最优的,如果使用的数据集很小,生成的规则就不是最优的。有几个规则没有出现在fp-growth和Apriori Weka算法中,因为这两种算法在Weka的工具中没有支持将要显示的规则的容差值。同时,所开发的TPQ- Apriori算法能够对大数据集和小数据集产生最优规则。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Determination of the best rule-based analysis results from the comparison of the Fp-Growth, Apriori, and TPQ-Apriori Algorithms for recommendation systems
The popular association rule algorithms are Apriori and fp-growth; both of these algorithms are very familiar among data mining researchers; however, there are some weaknesses found in the association rule algorithm, including long dataset scans in the process of finding the frequency of the item set, using large memory, and the resulting rules being sometimes less than optimal. In this study, the authors made a comparison of the fp-growth, Apriori, and TPQ-Apriori algorithms to analyze the rule results of the three algorithms. TPQ- Apriori is an algorithm developed from the Apriori algorithm. For experiments, the Apriori and fp-growth algorithms use RapidMiner and Weka tools, while the TPQ-apriori algorithm uses self-built application programs. The dataset used is the sales data for the Kopegtel NTB department store, which has been uploaded on the Kaggle site. As for the results of testing the base rules from the overall results of testing the rules with the good Kopegtel dataset for 100%, 50%, and 25% of the total volume of the dataset, a conclusion can be drawn that the larger the dataset to be processed, the results will be more optimal when using the fp-growth algorithm RapidMiner, but not optimal if the dataset to be processed is small. It is different from using the Apriori and Weka FP-growth algorithms, where the resulting rules are less than optimal if the dataset used is large and optimal if the dataset is small. Several rules do not appear in the fp-growth and Apriori Weka algorithms because the two algorithms do not have a tolerance value in Weka's tools for the support of the rules that will be displayed. Meanwhile, the TPQ- Apriori algorithm that has been developed is capable of producing optimal rules for both large datasets and small datasets.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
13
审稿时长
24 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信