基于Apriori的大数据频繁项集挖掘方法综述

2020 6th International Conference on Web Research (ICWR) Pub Date : 2020-04-01 DOI:10.1109/ICWR49608.2020.9122295

Mohammad Javad Shayegan Fard, Parsa Asgari Namin

{"title":"基于Apriori的大数据频繁项集挖掘方法综述","authors":"Mohammad Javad Shayegan Fard, Parsa Asgari Namin","doi":"10.1109/ICWR49608.2020.9122295","DOIUrl":null,"url":null,"abstract":"The data being generated today is massive in terms of volume, velocity, and variety. It is a great challenge to derive knowledge from data in this condition. Researchers, therefore have proposed ways to deal with this challenge. frequent itemset mining is one of the proposed ways to distinguish itemsets inside the vast amount of data to aid the operation of a variety of pursuits and businesses, a process termed ‘Association Rule Mining'. However, there are a variety of works done in this area. The introduction of different algorithms, frameworks, and applications throughout the recent decade has produced many interesting approaches. One of the algorithms in this area is the Apriori algorithm. It is a simple yet powerful algorithm. However, the original Apriori is not suitable for big data and due to this reason, researchers have attempted to introduce ways and schemes to adapt it to this new age of data. Because of the number of efforts in this area, having a bird's eye view of the past works is of value. This review aims to present an insight into the works done in the intersection of two matters: big data and the Apriori algorithm. It is concerned with Aprioribased algorithms presented in the recent decade with a focus on the three popular big data platforms: Apache Hadoop, Spark, and Flink. Also, major points of each approach and solution is presented. A conclusion in the end summarizes the points discussed in this paper.","PeriodicalId":231982,"journal":{"name":"2020 6th International Conference on Web Research (ICWR)","volume":"02 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":"{\"title\":\"Review of Apriori based Frequent Itemset Mining Solutions on Big Data\",\"authors\":\"Mohammad Javad Shayegan Fard, Parsa Asgari Namin\",\"doi\":\"10.1109/ICWR49608.2020.9122295\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The data being generated today is massive in terms of volume, velocity, and variety. It is a great challenge to derive knowledge from data in this condition. Researchers, therefore have proposed ways to deal with this challenge. frequent itemset mining is one of the proposed ways to distinguish itemsets inside the vast amount of data to aid the operation of a variety of pursuits and businesses, a process termed ‘Association Rule Mining'. However, there are a variety of works done in this area. The introduction of different algorithms, frameworks, and applications throughout the recent decade has produced many interesting approaches. One of the algorithms in this area is the Apriori algorithm. It is a simple yet powerful algorithm. However, the original Apriori is not suitable for big data and due to this reason, researchers have attempted to introduce ways and schemes to adapt it to this new age of data. Because of the number of efforts in this area, having a bird's eye view of the past works is of value. This review aims to present an insight into the works done in the intersection of two matters: big data and the Apriori algorithm. It is concerned with Aprioribased algorithms presented in the recent decade with a focus on the three popular big data platforms: Apache Hadoop, Spark, and Flink. Also, major points of each approach and solution is presented. A conclusion in the end summarizes the points discussed in this paper.\",\"PeriodicalId\":231982,\"journal\":{\"name\":\"2020 6th International Conference on Web Research (ICWR)\",\"volume\":\"02 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-04-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"7\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2020 6th International Conference on Web Research (ICWR)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICWR49608.2020.9122295\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 6th International Conference on Web Research (ICWR)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICWR49608.2020.9122295","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 7

摘要

今天产生的数据在数量、速度和种类上都是巨大的。在这种情况下，从数据中获取知识是一个巨大的挑战。因此，研究人员提出了应对这一挑战的方法。频繁项集挖掘是在大量数据中区分项集的一种被提出的方法，以帮助各种追求和业务的操作，这个过程被称为“关联规则挖掘”。然而，在这一领域有各种各样的工作。在最近十年中，不同算法、框架和应用程序的引入产生了许多有趣的方法。这个领域的一个算法是Apriori算法。这是一个简单而强大的算法。然而，最初的Apriori并不适合大数据，由于这个原因，研究人员试图引入方法和方案来适应这个新的数据时代。由于在这方面的努力，鸟瞰过去的作品是有价值的。这篇综述的目的是对大数据和Apriori算法这两个问题的交叉点所做的工作进行深入的了解。它关注的是近十年来出现的基于apriori的算法，重点关注三种流行的大数据平台:Apache Hadoop、Spark和Flink。此外，还介绍了每种方法和解决方案的要点。结语部分总结了本文的研究要点。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Review of Apriori based Frequent Itemset Mining Solutions on Big Data

The data being generated today is massive in terms of volume, velocity, and variety. It is a great challenge to derive knowledge from data in this condition. Researchers, therefore have proposed ways to deal with this challenge. frequent itemset mining is one of the proposed ways to distinguish itemsets inside the vast amount of data to aid the operation of a variety of pursuits and businesses, a process termed ‘Association Rule Mining'. However, there are a variety of works done in this area. The introduction of different algorithms, frameworks, and applications throughout the recent decade has produced many interesting approaches. One of the algorithms in this area is the Apriori algorithm. It is a simple yet powerful algorithm. However, the original Apriori is not suitable for big data and due to this reason, researchers have attempted to introduce ways and schemes to adapt it to this new age of data. Because of the number of efforts in this area, having a bird's eye view of the past works is of value. This review aims to present an insight into the works done in the intersection of two matters: big data and the Apriori algorithm. It is concerned with Aprioribased algorithms presented in the recent decade with a focus on the three popular big data platforms: Apache Hadoop, Spark, and Flink. Also, major points of each approach and solution is presented. A conclusion in the end summarizes the points discussed in this paper.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2020 6th International Conference on Web Research (ICWR)

自引率

0.00%

发文量