P-BBA:基于主/从并行二进制的大数据频繁项集挖掘算法

2020 International Conference on Computational Intelligence (ICCI) Pub Date : 2020-10-08 DOI:10.1109/ICCI51257.2020.9247689

Aliya Najiha Amir, H. Alhussian, S. Fageeri, R. Ahmad

{"title":"P-BBA:基于主/从并行二进制的大数据频繁项集挖掘算法","authors":"Aliya Najiha Amir, H. Alhussian, S. Fageeri, R. Ahmad","doi":"10.1109/ICCI51257.2020.9247689","DOIUrl":null,"url":null,"abstract":"Frequent itemsets mining is an effective but computational expensive technique especially when dealing with big datasets. Hence, the need for a customizable algorithm to work with big datasets in a reasonable time becomes a necessity. The Binary-based Technique Algorithm (BBT) used a binary representation of the database transactions as well as binary operations in order to simplify the process of identifying the frequent patterns as well as reduce the memory consumption. However, BBT algorithm still suffer the problem of low performance in terms of execution times when dealing with big data. This is due to the fact that the BBT algorithm was designed to run as a single thread of execution. Therefore, there is a need to improve the performance of the Binary-based Technique Algorithm (BBT). In this research, we proposed a Parallel Binary-Based Algorithm (P-BBA) towards solving the above mentioned problem. The objective of the proposed P-BBA is to process big datasets by developing collaborative threads that would work together concurrently and collaboratively and generates the list of frequent itemsets within an acceptable time frame. The algorithm is designed using a Master/Slave thread model to fits in Apache Spark distributed platform. The performance will be evaluated based on the total execution time.","PeriodicalId":194158,"journal":{"name":"2020 International Conference on Computational Intelligence (ICCI)","volume":"63 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-10-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"P-BBA: A Master/Slave Parallel Binary-based Algorithm for Mining Frequent Itemsets in Big Data\",\"authors\":\"Aliya Najiha Amir, H. Alhussian, S. Fageeri, R. Ahmad\",\"doi\":\"10.1109/ICCI51257.2020.9247689\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Frequent itemsets mining is an effective but computational expensive technique especially when dealing with big datasets. Hence, the need for a customizable algorithm to work with big datasets in a reasonable time becomes a necessity. The Binary-based Technique Algorithm (BBT) used a binary representation of the database transactions as well as binary operations in order to simplify the process of identifying the frequent patterns as well as reduce the memory consumption. However, BBT algorithm still suffer the problem of low performance in terms of execution times when dealing with big data. This is due to the fact that the BBT algorithm was designed to run as a single thread of execution. Therefore, there is a need to improve the performance of the Binary-based Technique Algorithm (BBT). In this research, we proposed a Parallel Binary-Based Algorithm (P-BBA) towards solving the above mentioned problem. The objective of the proposed P-BBA is to process big datasets by developing collaborative threads that would work together concurrently and collaboratively and generates the list of frequent itemsets within an acceptable time frame. The algorithm is designed using a Master/Slave thread model to fits in Apache Spark distributed platform. The performance will be evaluated based on the total execution time.\",\"PeriodicalId\":194158,\"journal\":{\"name\":\"2020 International Conference on Computational Intelligence (ICCI)\",\"volume\":\"63 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-10-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 International Conference on Computational Intelligence (ICCI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICCI51257.2020.9247689\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Computational Intelligence (ICCI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCI51257.2020.9247689","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 1

摘要

频繁项集挖掘是一种有效但计算成本高的技术，特别是在处理大数据集时。因此，需要一种可定制的算法来在合理的时间内处理大数据集变得非常必要。基于二进制的技术算法(BBT)使用数据库事务的二进制表示和二进制操作，以简化识别频繁模式的过程并减少内存消耗。然而，在处理大数据时，BBT算法在执行时间上仍然存在性能不高的问题。这是因为BBT算法被设计为作为单个执行线程运行。因此，有必要提高基于二进制的技术算法(BBT)的性能。在本研究中，我们提出了一种基于并行二进制的算法(P-BBA)来解决上述问题。拟议的P-BBA的目标是通过开发协作线程来处理大数据集，这些线程可以并发地协同工作，并在可接受的时间框架内生成频繁项集列表。该算法采用了适合Apache Spark分布式平台的主/从线程模型。性能将根据总执行时间进行评估。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

P-BBA: A Master/Slave Parallel Binary-based Algorithm for Mining Frequent Itemsets in Big Data

Frequent itemsets mining is an effective but computational expensive technique especially when dealing with big datasets. Hence, the need for a customizable algorithm to work with big datasets in a reasonable time becomes a necessity. The Binary-based Technique Algorithm (BBT) used a binary representation of the database transactions as well as binary operations in order to simplify the process of identifying the frequent patterns as well as reduce the memory consumption. However, BBT algorithm still suffer the problem of low performance in terms of execution times when dealing with big data. This is due to the fact that the BBT algorithm was designed to run as a single thread of execution. Therefore, there is a need to improve the performance of the Binary-based Technique Algorithm (BBT). In this research, we proposed a Parallel Binary-Based Algorithm (P-BBA) towards solving the above mentioned problem. The objective of the proposed P-BBA is to process big datasets by developing collaborative threads that would work together concurrently and collaboratively and generates the list of frequent itemsets within an acceptable time frame. The algorithm is designed using a Master/Slave thread model to fits in Apache Spark distributed platform. The performance will be evaluated based on the total execution time.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 International Conference on Computational Intelligence (ICCI)

自引率

0.00%

发文量