P-BBA:基于主/从并行二进制的大数据频繁项集挖掘算法

Aliya Najiha Amir, H. Alhussian, S. Fageeri, R. Ahmad
{"title":"P-BBA:基于主/从并行二进制的大数据频繁项集挖掘算法","authors":"Aliya Najiha Amir, H. Alhussian, S. Fageeri, R. Ahmad","doi":"10.1109/ICCI51257.2020.9247689","DOIUrl":null,"url":null,"abstract":"Frequent itemsets mining is an effective but computational expensive technique especially when dealing with big datasets. Hence, the need for a customizable algorithm to work with big datasets in a reasonable time becomes a necessity. The Binary-based Technique Algorithm (BBT) used a binary representation of the database transactions as well as binary operations in order to simplify the process of identifying the frequent patterns as well as reduce the memory consumption. However, BBT algorithm still suffer the problem of low performance in terms of execution times when dealing with big data. This is due to the fact that the BBT algorithm was designed to run as a single thread of execution. Therefore, there is a need to improve the performance of the Binary-based Technique Algorithm (BBT). In this research, we proposed a Parallel Binary-Based Algorithm (P-BBA) towards solving the above mentioned problem. The objective of the proposed P-BBA is to process big datasets by developing collaborative threads that would work together concurrently and collaboratively and generates the list of frequent itemsets within an acceptable time frame. The algorithm is designed using a Master/Slave thread model to fits in Apache Spark distributed platform. The performance will be evaluated based on the total execution time.","PeriodicalId":194158,"journal":{"name":"2020 International Conference on Computational Intelligence (ICCI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"P-BBA: A Master/Slave Parallel Binary-based Algorithm for Mining Frequent Itemsets in Big Data\",\"authors\":\"Aliya Najiha Amir, H. Alhussian, S. Fageeri, R. Ahmad\",\"doi\":\"10.1109/ICCI51257.2020.9247689\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Frequent itemsets mining is an effective but computational expensive technique especially when dealing with big datasets. Hence, the need for a customizable algorithm to work with big datasets in a reasonable time becomes a necessity. The Binary-based Technique Algorithm (BBT) used a binary representation of the database transactions as well as binary operations in order to simplify the process of identifying the frequent patterns as well as reduce the memory consumption. However, BBT algorithm still suffer the problem of low performance in terms of execution times when dealing with big data. This is due to the fact that the BBT algorithm was designed to run as a single thread of execution. Therefore, there is a need to improve the performance of the Binary-based Technique Algorithm (BBT). In this research, we proposed a Parallel Binary-Based Algorithm (P-BBA) towards solving the above mentioned problem. The objective of the proposed P-BBA is to process big datasets by developing collaborative threads that would work together concurrently and collaboratively and generates the list of frequent itemsets within an acceptable time frame. The algorithm is designed using a Master/Slave thread model to fits in Apache Spark distributed platform. The performance will be evaluated based on the total execution time.\",\"PeriodicalId\":194158,\"journal\":{\"name\":\"2020 International Conference on Computational Intelligence (ICCI)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Computational Intelligence (ICCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCI51257.2020.9247689\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computational Intelligence (ICCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI51257.2020.9247689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

频繁项集挖掘是一种有效但计算成本高的技术,特别是在处理大数据集时。因此,需要一种可定制的算法来在合理的时间内处理大数据集变得非常必要。基于二进制的技术算法(BBT)使用数据库事务的二进制表示和二进制操作,以简化识别频繁模式的过程并减少内存消耗。然而,在处理大数据时,BBT算法在执行时间上仍然存在性能不高的问题。这是因为BBT算法被设计为作为单个执行线程运行。因此,有必要提高基于二进制的技术算法(BBT)的性能。在本研究中,我们提出了一种基于并行二进制的算法(P-BBA)来解决上述问题。拟议的P-BBA的目标是通过开发协作线程来处理大数据集,这些线程可以并发地协同工作,并在可接受的时间框架内生成频繁项集列表。该算法采用了适合Apache Spark分布式平台的主/从线程模型。性能将根据总执行时间进行评估。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
P-BBA: A Master/Slave Parallel Binary-based Algorithm for Mining Frequent Itemsets in Big Data
Frequent itemsets mining is an effective but computational expensive technique especially when dealing with big datasets. Hence, the need for a customizable algorithm to work with big datasets in a reasonable time becomes a necessity. The Binary-based Technique Algorithm (BBT) used a binary representation of the database transactions as well as binary operations in order to simplify the process of identifying the frequent patterns as well as reduce the memory consumption. However, BBT algorithm still suffer the problem of low performance in terms of execution times when dealing with big data. This is due to the fact that the BBT algorithm was designed to run as a single thread of execution. Therefore, there is a need to improve the performance of the Binary-based Technique Algorithm (BBT). In this research, we proposed a Parallel Binary-Based Algorithm (P-BBA) towards solving the above mentioned problem. The objective of the proposed P-BBA is to process big datasets by developing collaborative threads that would work together concurrently and collaboratively and generates the list of frequent itemsets within an acceptable time frame. The algorithm is designed using a Master/Slave thread model to fits in Apache Spark distributed platform. The performance will be evaluated based on the total execution time.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信