Sufang Zhang, Jun-Hai Zhai, Shi Tian, Xiang Zhou, Yan Li
{"title":"Feature Selection for Big Data Based on Mapreduce and Voting Mechanism","authors":"Sufang Zhang, Jun-Hai Zhai, Shi Tian, Xiang Zhou, Yan Li","doi":"10.1109/ICMLC51923.2020.9469541","DOIUrl":null,"url":null,"abstract":"With the rapid development of computer network technology and wireless sensor technology, as well as the arrival of the era of big data, the dimension and sample number of data are growing rapidly. Accordingly, it is important to investigate the problem of feature selection for big data and to design feature selection algorithm for big data. Based on MapReduce and voting mechanism, a feature selection method for big data is proposed in this paper. The proposed methods include three steps: Firstly, partition big data set into m subsets, and deploy the subsets to m computing nodes of Hadoop. Secondly, on the m computing nodes, we employ a feature selection algorithm based on genetic algorithm to select important features in parallel using local data subset, and obtain m feature subsets. Finally, for each feature, m feature subsets are used to vote on it, and the final feature subset is selected according to the voting results. Experimental results on four big data sets demonstrate that the proposed method is effective and efficient.","PeriodicalId":170815,"journal":{"name":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","volume":"390 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2020 International Conference on Machine Learning and Cybernetics (ICMLC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICMLC51923.2020.9469541","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
With the rapid development of computer network technology and wireless sensor technology, as well as the arrival of the era of big data, the dimension and sample number of data are growing rapidly. Accordingly, it is important to investigate the problem of feature selection for big data and to design feature selection algorithm for big data. Based on MapReduce and voting mechanism, a feature selection method for big data is proposed in this paper. The proposed methods include three steps: Firstly, partition big data set into m subsets, and deploy the subsets to m computing nodes of Hadoop. Secondly, on the m computing nodes, we employ a feature selection algorithm based on genetic algorithm to select important features in parallel using local data subset, and obtain m feature subsets. Finally, for each feature, m feature subsets are used to vote on it, and the final feature subset is selected according to the voting results. Experimental results on four big data sets demonstrate that the proposed method is effective and efficient.