{"title":"A Parallel Softmax Classification Algorithm Based on MapReduce","authors":"Zexi Chen, Junyan Cheng","doi":"10.1109/ICCSE.2018.8468863","DOIUrl":null,"url":null,"abstract":"Softmax Classification algorithm is the generalization of Logistic classification algorithm on multi-classification problems. The traditional stand-alone training algorithm's efficiency is extremely low in the running condition of large amount of data. When substantially increasing the amount of data, the algorithm above will take a lot of time to update parameters. Although the Mahout in Hadoop has realized the Logistic regression, naive Bayesian classifier and classification algorithms, the Softmax classification algorithms has not. The core part of the implementation of parallel Softmax classification algorithm is to use Mapreduce to read the training data partially and use Map tasks and Reduce tasks to realize Parallel gradient descent algorithm, which could finally iteratively update algorithm parameters. Via correlative experiment, it can be proved that the Parallelized Softmax algorithm based on Mapreduce can shorten the process of iteration and the running time and improve training efficiency and precision.","PeriodicalId":228760,"journal":{"name":"2018 13th International Conference on Computer Science & Education (ICCSE)","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-08-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2018 13th International Conference on Computer Science & Education (ICCSE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICCSE.2018.8468863","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2
Abstract
Softmax Classification algorithm is the generalization of Logistic classification algorithm on multi-classification problems. The traditional stand-alone training algorithm's efficiency is extremely low in the running condition of large amount of data. When substantially increasing the amount of data, the algorithm above will take a lot of time to update parameters. Although the Mahout in Hadoop has realized the Logistic regression, naive Bayesian classifier and classification algorithms, the Softmax classification algorithms has not. The core part of the implementation of parallel Softmax classification algorithm is to use Mapreduce to read the training data partially and use Map tasks and Reduce tasks to realize Parallel gradient descent algorithm, which could finally iteratively update algorithm parameters. Via correlative experiment, it can be proved that the Parallelized Softmax algorithm based on Mapreduce can shorten the process of iteration and the running time and improve training efficiency and precision.