{"title":"Decision tree algorithm optimization research based on MapReduce","authors":"F. Yuan, F. Lian, Xingjian Xu, Zhaohua Ji","doi":"10.1109/ICSESS.2015.7339225","DOIUrl":null,"url":null,"abstract":"With the advent of the computer science, the data volume that needed to be processed under many practical situations increases dramatically, challenging many traditional machine learning techniques. Bearing this in mind, we made an intensive study on the optimization of decision tree algorithm and its corresponding porting to the big data analysis in this paper. An optimized genetic algorithm is merged into the implementation of the decision tree algorithm above, and we also invent a parallel genetic decision tree algorithm using MapReduce, which is very suitable for analyzing big data in cloud computing environment. Experiment results show that our algorithm acquires a nearly linear speedup, keeping a similar classification accuracy at the same time.","PeriodicalId":335871,"journal":{"name":"2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)","volume":"5 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2015-11-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2015 6th IEEE International Conference on Software Engineering and Service Science (ICSESS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICSESS.2015.7339225","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14
Abstract
With the advent of the computer science, the data volume that needed to be processed under many practical situations increases dramatically, challenging many traditional machine learning techniques. Bearing this in mind, we made an intensive study on the optimization of decision tree algorithm and its corresponding porting to the big data analysis in this paper. An optimized genetic algorithm is merged into the implementation of the decision tree algorithm above, and we also invent a parallel genetic decision tree algorithm using MapReduce, which is very suitable for analyzing big data in cloud computing environment. Experiment results show that our algorithm acquires a nearly linear speedup, keeping a similar classification accuracy at the same time.