{"title":"MReC4.5: C4.5 Ensemble Classification with MapReduce","authors":"Gongqing Wu, Hai-Guang Li, Xuegang Hu, Yuan-Jun Bi, J. Zhang, Xindong Wu","doi":"10.1109/ChinaGrid.2009.39","DOIUrl":null,"url":null,"abstract":"Classification is a significant technique in data mining research and applications. C4.5 is a widely used classification method, and ensemble learning adopts a parallel and distributed computing model for classification. Based on analyses of the MapReduce computing paradigm and the process of ensemble learning, we find that the parallel and distributed computing model in MapReduce is appropriate for implementing ensemble learning. This paper takes the advantages of C4.5, ensemble learning and the MapReduce computing model, and proposes a new method MReC4.5 for parallel and distributed ensemble classification. Our experimental results show that increasing the number of nodes would benefit the effectiveness of classification modeling, and serialization operations at the model level make the MReC4.5 classifier “construct once, use anywhere”.","PeriodicalId":212445,"journal":{"name":"2009 Fourth ChinaGrid Annual Conference","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2009-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"57","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2009 Fourth ChinaGrid Annual Conference","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ChinaGrid.2009.39","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 57
Abstract
Classification is a significant technique in data mining research and applications. C4.5 is a widely used classification method, and ensemble learning adopts a parallel and distributed computing model for classification. Based on analyses of the MapReduce computing paradigm and the process of ensemble learning, we find that the parallel and distributed computing model in MapReduce is appropriate for implementing ensemble learning. This paper takes the advantages of C4.5, ensemble learning and the MapReduce computing model, and proposes a new method MReC4.5 for parallel and distributed ensemble classification. Our experimental results show that increasing the number of nodes would benefit the effectiveness of classification modeling, and serialization operations at the model level make the MReC4.5 classifier “construct once, use anywhere”.