基于MapReduce的大数据马尔可夫毛毯贝叶斯网络学习

Yuxin Che, Shaohui Hong, Defu Zhang, Liming Zhang
{"title":"基于MapReduce的大数据马尔可夫毛毯贝叶斯网络学习","authors":"Yuxin Che, Shaohui Hong, Defu Zhang, Liming Zhang","doi":"10.1109/ICTAI.2016.0138","DOIUrl":null,"url":null,"abstract":"A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.","PeriodicalId":245697,"journal":{"name":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Learning Markov Blanket Bayesian Network for Big Data in MapReduce\",\"authors\":\"Yuxin Che, Shaohui Hong, Defu Zhang, Liming Zhang\",\"doi\":\"10.1109/ICTAI.2016.0138\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.\",\"PeriodicalId\":245697,\"journal\":{\"name\":\"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"volume\":\"97 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICTAI.2016.0138\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2016.0138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

摘要

在大数据时代,处理海量数据是数据挖掘的一个挑战。MapReduce是克服这一挑战的一个有吸引力的模型。提出了一种加速马尔可夫包层贝叶斯网络学习过程的新方法。在一些复杂的数据集上,马尔可夫毯是一种较好的贝叶斯网络模型类型。学习马尔可夫毯的时间和空间成本很大,并且随着变量的增加而快速增长。其独立性测试需要大量的数据,这使得问题更加困难。将统计阶段和独立性测试并行化,使其在MapReduce框架中找到变量之间合适的关系。通过对4个数据集的测试,给出了计算结果,结果表明使用MapReduce可以获得加速。特别是,MapReduce中的马尔可夫毯比naïve贝叶斯和树增强naïve贝叶斯具有更高的准确率。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning Markov Blanket Bayesian Network for Big Data in MapReduce
A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信