在Hadoop MapReduce环境中挖掘正关联规则和负关联规则

Proceedings of the ACMSE 2018 Conference Pub Date : 2018-03-29 DOI:10.1145/3190645.3190701

S. Bagui, Probal Chandra Dhar

{"title":"在Hadoop MapReduce环境中挖掘正关联规则和负关联规则","authors":"S. Bagui, Probal Chandra Dhar","doi":"10.1145/3190645.3190701","DOIUrl":null,"url":null,"abstract":"In this paper, we mine positive and negative rules from Big Data in Hadoop's MapReduce Environment. Positive association rule mining finds items that are positively co-related whereas negative association rule mining finds items that are negatively correlated. Positive association rule mining has been traditionally used to mine association rules, but negative association rule mining also has many applications, including the building of efficient decision support systems, for crime data analysis [2], in the health care sector [1], etc. In this paper, we mine positive and negative association rules using the Apriori algorithm in the Big Data environment using Hadoop's MapReduce environment. Positive association rules are in the form X→Y, which has support s in a transaction set D if s% of the transactions in D contain X U Y. A negative association rule is in the form X → ┐ Y or ┐ X → Y or ┐ X → ┐ Y where X ∩ Y = Ø. X → ┐ Y refers to X occurring in the absence of Y; ┐ X → Y refers to Y occurring in the absence of X; ┐ X → ┐ Y means not X and not Y. For positive association rules: Support (X → Y) refers to the percentage of transactions where itemsets X and Y co-exist in a dataset. Confidence (X → Y) is taken to be the conditional probability, P(X|Y). That is, the percentage of transactions containing X that also contain Y. Support of the negative association rules will be form: Supp(X → ┐ Y) > min_supp; Supp(┐ X → Y) > min_supp; Supp(┐ X → ┐ Y) > min_supp. Confidence of negative association rules will be in the form: Conf(X → ┐ Y) > min_supp; Conf(┐ X → Y) > min_supp; Conf(┐ X → ┐ Y) > min_supp. In MapReduce, we scan the dataset and create 1-itemsets in one MapReduce job and then use this 1-itemset to create 2-itemsets in another MapReduce job. In the last map job, the calculation of positive and negative association rules as well as the calculations of support, confidence and lift are performed. Therefore, in essence, we use three map and two reduce jobs. The main contribution of this work is in presenting how the apriori algorithm can be used to extract negative association rules from Big Data and how it can be executed efficiently on MapReduce.","PeriodicalId":403177,"journal":{"name":"Proceedings of the ACMSE 2018 Conference","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Mining positive and negative association rules in Hadoop's MapReduce environment\",\"authors\":\"S. Bagui, Probal Chandra Dhar\",\"doi\":\"10.1145/3190645.3190701\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this paper, we mine positive and negative rules from Big Data in Hadoop's MapReduce Environment. Positive association rule mining finds items that are positively co-related whereas negative association rule mining finds items that are negatively correlated. Positive association rule mining has been traditionally used to mine association rules, but negative association rule mining also has many applications, including the building of efficient decision support systems, for crime data analysis [2], in the health care sector [1], etc. In this paper, we mine positive and negative association rules using the Apriori algorithm in the Big Data environment using Hadoop's MapReduce environment. Positive association rules are in the form X→Y, which has support s in a transaction set D if s% of the transactions in D contain X U Y. A negative association rule is in the form X → ┐ Y or ┐ X → Y or ┐ X → ┐ Y where X ∩ Y = Ø. X → ┐ Y refers to X occurring in the absence of Y; ┐ X → Y refers to Y occurring in the absence of X; ┐ X → ┐ Y means not X and not Y. For positive association rules: Support (X → Y) refers to the percentage of transactions where itemsets X and Y co-exist in a dataset. Confidence (X → Y) is taken to be the conditional probability, P(X|Y). That is, the percentage of transactions containing X that also contain Y. Support of the negative association rules will be form: Supp(X → ┐ Y) > min_supp; Supp(┐ X → Y) > min_supp; Supp(┐ X → ┐ Y) > min_supp. Confidence of negative association rules will be in the form: Conf(X → ┐ Y) > min_supp; Conf(┐ X → Y) > min_supp; Conf(┐ X → ┐ Y) > min_supp. In MapReduce, we scan the dataset and create 1-itemsets in one MapReduce job and then use this 1-itemset to create 2-itemsets in another MapReduce job. In the last map job, the calculation of positive and negative association rules as well as the calculations of support, confidence and lift are performed. Therefore, in essence, we use three map and two reduce jobs. The main contribution of this work is in presenting how the apriori algorithm can be used to extract negative association rules from Big Data and how it can be executed efficiently on MapReduce.\",\"PeriodicalId\":403177,\"journal\":{\"name\":\"Proceedings of the ACMSE 2018 Conference\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-03-29\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ACMSE 2018 Conference\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3190645.3190701\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ACMSE 2018 Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3190645.3190701","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

在本文中，我们从Hadoop的MapReduce环境中挖掘大数据的正规则和负规则。正关联规则挖掘发现正相关的项目，而负关联规则挖掘发现负相关的项目。正关联规则挖掘传统上用于挖掘关联规则，但负关联规则挖掘也有许多应用，包括构建高效的决策支持系统，用于犯罪数据分析[2]，在医疗保健领域[1]等。在本文中，我们使用Hadoop的MapReduce环境，在大数据环境中使用Apriori算法挖掘正关联规则和负关联规则。正关联规则的形式为X→Y，如果D中有s%的事务包含X U Y，则在事务集D中支持X→Y。反关联规则的形式为X→对Y或对X→对Y，其中X∩Y = Ø。X→- Y指在没有Y的情况下发生的X;- X→Y指在没有X的情况下Y发生;- X→- Y表示非X和非Y。对于正关联规则:支持度(X→Y)是指数据集中项目集X和Y同时存在的事务的百分比。置信度(X→Y)取为条件概率P(X|Y)。也就是说，包含X且包含Y的事务的百分比。负关联规则的支持形式为:Supp(X→- Y) > min_supp;Supp(- X→Y) > min_supp;Supp(- X→- Y) > min_supp。负关联规则置信度的形式为:Conf(X→- Y) > min_supp;Conf(- X→Y) > min_supp;Conf(- X→- Y) > min_supp. Conf。在MapReduce中，我们扫描数据集并在一个MapReduce作业中创建1-itemset，然后使用这个1-itemset在另一个MapReduce作业中创建2-itemset。最后进行了正关联规则和负关联规则的计算，以及支撑力、置信度和扬程的计算。因此，在本质上，我们使用了三个map和两个reduce作业。这项工作的主要贡献是展示了如何使用apriori算法从大数据中提取负关联规则，以及如何在MapReduce上有效地执行它。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Mining positive and negative association rules in Hadoop's MapReduce environment

In this paper, we mine positive and negative rules from Big Data in Hadoop's MapReduce Environment. Positive association rule mining finds items that are positively co-related whereas negative association rule mining finds items that are negatively correlated. Positive association rule mining has been traditionally used to mine association rules, but negative association rule mining also has many applications, including the building of efficient decision support systems, for crime data analysis [2], in the health care sector [1], etc. In this paper, we mine positive and negative association rules using the Apriori algorithm in the Big Data environment using Hadoop's MapReduce environment. Positive association rules are in the form X→Y, which has support s in a transaction set D if s% of the transactions in D contain X U Y. A negative association rule is in the form X → ┐ Y or ┐ X → Y or ┐ X → ┐ Y where X ∩ Y = Ø. X → ┐ Y refers to X occurring in the absence of Y; ┐ X → Y refers to Y occurring in the absence of X; ┐ X → ┐ Y means not X and not Y. For positive association rules: Support (X → Y) refers to the percentage of transactions where itemsets X and Y co-exist in a dataset. Confidence (X → Y) is taken to be the conditional probability, P(X|Y). That is, the percentage of transactions containing X that also contain Y. Support of the negative association rules will be form: Supp(X → ┐ Y) > min_supp; Supp(┐ X → Y) > min_supp; Supp(┐ X → ┐ Y) > min_supp. Confidence of negative association rules will be in the form: Conf(X → ┐ Y) > min_supp; Conf(┐ X → Y) > min_supp; Conf(┐ X → ┐ Y) > min_supp. In MapReduce, we scan the dataset and create 1-itemsets in one MapReduce job and then use this 1-itemset to create 2-itemsets in another MapReduce job. In the last map job, the calculation of positive and negative association rules as well as the calculations of support, confidence and lift are performed. Therefore, in essence, we use three map and two reduce jobs. The main contribution of this work is in presenting how the apriori algorithm can be used to extract negative association rules from Big Data and how it can be executed efficiently on MapReduce.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the ACMSE 2018 Conference

自引率

0.00%

发文量