基于MapReduce的大数据马尔可夫毛毯贝叶斯网络学习

2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI) Pub Date : 2016-11-01 DOI:10.1109/ICTAI.2016.0138

Yuxin Che, Shaohui Hong, Defu Zhang, Liming Zhang

{"title":"基于MapReduce的大数据马尔可夫毛毯贝叶斯网络学习","authors":"Yuxin Che, Shaohui Hong, Defu Zhang, Liming Zhang","doi":"10.1109/ICTAI.2016.0138","DOIUrl":null,"url":null,"abstract":"A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.","PeriodicalId":245697,"journal":{"name":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Learning Markov Blanket Bayesian Network for Big Data in MapReduce\",\"authors\":\"Yuxin Che, Shaohui Hong, Defu Zhang, Liming Zhang\",\"doi\":\"10.1109/ICTAI.2016.0138\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.\",\"PeriodicalId\":245697,\"journal\":{\"name\":\"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI.2016.0138\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2016.0138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

摘要

在大数据时代，处理海量数据是数据挖掘的一个挑战。MapReduce是克服这一挑战的一个有吸引力的模型。提出了一种加速马尔可夫包层贝叶斯网络学习过程的新方法。在一些复杂的数据集上，马尔可夫毯是一种较好的贝叶斯网络模型类型。学习马尔可夫毯的时间和空间成本很大，并且随着变量的增加而快速增长。其独立性测试需要大量的数据，这使得问题更加困难。将统计阶段和独立性测试并行化，使其在MapReduce框架中找到变量之间合适的关系。通过对4个数据集的测试，给出了计算结果，结果表明使用MapReduce可以获得加速。特别是，MapReduce中的马尔可夫毯比naïve贝叶斯和树增强naïve贝叶斯具有更高的准确率。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Learning Markov Blanket Bayesian Network for Big Data in MapReduce

A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)

自引率

0.00%

发文量