{"title":"Efficient Logistic Regression with L2 Regularization using ADMM on Spark","authors":"Xiao Su","doi":"10.1145/3409073.3409077","DOIUrl":null,"url":null,"abstract":"Linear classification has demonstrated success in many areas of applications. Modern algorithms for linear classification can train reasonably good models while going through the data in only tens of rounds. However, large data often does not fit in the memory of a single machine, which makes the bottleneck in large-scale learning the disk I/O, not the CPU. In this paper, we describe a specific implementation of the Alternating Direction Method of Multipliers (ADMM) algorithm for distributed optimization. This implementation runs logistic regression with L2 regularization over large datasets and does not require a user-tuned learning rate meta-parameter or any tools beyond Spark. We implement this framework in Apache Spark and compare it with the widely used Machine Learning LIBrary (MLLIB) in Apache Spark 2.4","PeriodicalId":229746,"journal":{"name":"Proceedings of the 2020 5th International Conference on Machine Learning Technologies","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 5th International Conference on Machine Learning Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409073.3409077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Linear classification has demonstrated success in many areas of applications. Modern algorithms for linear classification can train reasonably good models while going through the data in only tens of rounds. However, large data often does not fit in the memory of a single machine, which makes the bottleneck in large-scale learning the disk I/O, not the CPU. In this paper, we describe a specific implementation of the Alternating Direction Method of Multipliers (ADMM) algorithm for distributed optimization. This implementation runs logistic regression with L2 regularization over large datasets and does not require a user-tuned learning rate meta-parameter or any tools beyond Spark. We implement this framework in Apache Spark and compare it with the widely used Machine Learning LIBrary (MLLIB) in Apache Spark 2.4