{"title":"Learning Markov Blanket Bayesian Network for Big Data in MapReduce","authors":"Yuxin Che, Shaohui Hong, Defu Zhang, Liming Zhang","doi":"10.1109/ICTAI.2016.0138","DOIUrl":null,"url":null,"abstract":"A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.","PeriodicalId":245697,"journal":{"name":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","volume":"97 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 IEEE 28th International Conference on Tools with Artificial Intelligence (ICTAI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICTAI.2016.0138","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
A challenge task of data mining is to process massive data in the big data era. MapReduce is an attractive model to overcome this challenge. This paper presents a new method to accelerate the process of learning Markov blanket Bayesian network(MBBN). Markov blanket is a better model type of Bayesian network in some complex datasets. The time and space cost of learning Markov blanket is large, and grows fast as the variables increase. Large amounts of data are needed for its independence test which makes the problem harder. The statistical phase and independence test are parallelized to make it find an appropriate relation among variables in the MapReduce framework. Computational results are reported by testing four datasets and show that the speed-up can be obtained by means of MapReduce. In particular, the Markov blanket in MapReduce has higher accuracy rate than naïve Bayesian and tree-augmented naïve Bayesian.