Mining skyline frequent-utility patterns from big data environment based on MapReduce framework

IF 0.8 4区计算机科学 Q4 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Intelligent Data Analysis Pub Date : 2023-08-10 DOI:10.3233/ida-220756

J. Wu, Ranran Li, Mu-En Wu, Jerry Chun‐wei Lin

{"title":"Mining skyline frequent-utility patterns from big data environment based on MapReduce framework","authors":"J. Wu, Ranran Li, Mu-En Wu, Jerry Chun‐wei Lin","doi":"10.3233/ida-220756","DOIUrl":null,"url":null,"abstract":"When the concentration focuses on data mining, frequent itemset mining (FIM) and high-utility itemset mining (HUIM) are commonly addressed and researched. Many related algorithms are proposed to reveal the general relationship between utility, frequency, and items in transaction databases. Although these algorithms can mine FIMs or HUIMs quickly, these algorithms merely take into account frequency or utility as a unilateral criterion for itemsets but ignore the concurrent itemsets, which are often more valuable for reference. A new skyline framework has been presented to mine frequent high utility patterns (SFUPs) to better support user decision-making. Several new algorithms have been proposed one after another. However, the Internet of Things (IoT), mobile Internet, and traditional Internet are generating massive amounts of data every day, and these cutting-edge standalone algorithms can not satisfy the new challenge of finding interesting patterns from this data. Big Data uses a distributed architecture in the form of cloud computing to filter and process this data to extract useful information. This paper proposes a novel parallel algorithm on Hadoop as a three-stage iterative algorithm based on MapReduce. MapReduce is used to divide the mining tasks of the whole large data set into multiple independent sub-tasks to find frequent and high utility patterns in parallel. Numerous experiments were done in this paper, and from the results, the algorithm can handle large datasets and show good performance on Hadoop clusters.","PeriodicalId":50355,"journal":{"name":"Intelligent Data Analysis","volume":" ","pages":""},"PeriodicalIF":0.8000,"publicationDate":"2023-08-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Intelligent Data Analysis","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/ida-220756","RegionNum":4,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

When the concentration focuses on data mining, frequent itemset mining (FIM) and high-utility itemset mining (HUIM) are commonly addressed and researched. Many related algorithms are proposed to reveal the general relationship between utility, frequency, and items in transaction databases. Although these algorithms can mine FIMs or HUIMs quickly, these algorithms merely take into account frequency or utility as a unilateral criterion for itemsets but ignore the concurrent itemsets, which are often more valuable for reference. A new skyline framework has been presented to mine frequent high utility patterns (SFUPs) to better support user decision-making. Several new algorithms have been proposed one after another. However, the Internet of Things (IoT), mobile Internet, and traditional Internet are generating massive amounts of data every day, and these cutting-edge standalone algorithms can not satisfy the new challenge of finding interesting patterns from this data. Big Data uses a distributed architecture in the form of cloud computing to filter and process this data to extract useful information. This paper proposes a novel parallel algorithm on Hadoop as a three-stage iterative algorithm based on MapReduce. MapReduce is used to divide the mining tasks of the whole large data set into multiple independent sub-tasks to find frequent and high utility patterns in parallel. Numerous experiments were done in this paper, and from the results, the algorithm can handle large datasets and show good performance on Hadoop clusters.

查看原文本刊更多论文

基于MapReduce框架的大数据环境天际线频繁效用模式挖掘

在数据挖掘的研究中，频繁项集挖掘(FIM)和高效用项集挖掘(HUIM)得到了广泛的关注和研究。提出了许多相关的算法来揭示事务数据库中效用、频率和项目之间的一般关系。虽然这些算法可以快速挖掘FIMs或HUIMs，但这些算法仅仅将频率或效用作为项目集的单方面标准，而忽略了并发项目集，并发项目集通常更有参考价值。提出了一种新的天际线框架来挖掘频繁高效用模式，以更好地支持用户决策。人们陆续提出了几种新的算法。然而，物联网(IoT)、移动互联网和传统互联网每天都在产生大量数据，这些前沿的独立算法无法满足从这些数据中寻找有趣模式的新挑战。大数据使用云计算形式的分布式架构对这些数据进行过滤和处理，以提取有用的信息。本文提出了一种基于MapReduce的三阶段迭代并行算法。利用MapReduce将整个大数据集的挖掘任务划分为多个独立的子任务，并行发现频繁且高效用的模式。本文进行了大量的实验，从实验结果来看，该算法可以处理大型数据集，并且在Hadoop集群上表现出良好的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Intelligent Data Analysis 工程技术-计算机：人工智能

CiteScore

2.20

自引率

5.90%

发文量

审稿时长

3.3 months

期刊介绍： Intelligent Data Analysis provides a forum for the examination of issues related to the research and applications of Artificial Intelligence techniques in data analysis across a variety of disciplines. These techniques include (but are not limited to): all areas of data visualization, data pre-processing (fusion, editing, transformation, filtering, sampling), data engineering, database mining techniques, tools and applications, use of domain knowledge in data analysis, big data applications, evolutionary algorithms, machine learning, neural nets, fuzzy logic, statistical pattern recognition, knowledge filtering, and post-processing. In particular, papers are preferred that discuss development of new AI related data analysis architectures, methodologies, and techniques and their applications to various domains.