Efficient Logistic Regression with L2 Regularization using ADMM on Spark

Xiao Su
{"title":"Efficient Logistic Regression with L2 Regularization using ADMM on Spark","authors":"Xiao Su","doi":"10.1145/3409073.3409077","DOIUrl":null,"url":null,"abstract":"Linear classification has demonstrated success in many areas of applications. Modern algorithms for linear classification can train reasonably good models while going through the data in only tens of rounds. However, large data often does not fit in the memory of a single machine, which makes the bottleneck in large-scale learning the disk I/O, not the CPU. In this paper, we describe a specific implementation of the Alternating Direction Method of Multipliers (ADMM) algorithm for distributed optimization. This implementation runs logistic regression with L2 regularization over large datasets and does not require a user-tuned learning rate meta-parameter or any tools beyond Spark. We implement this framework in Apache Spark and compare it with the widely used Machine Learning LIBrary (MLLIB) in Apache Spark 2.4","PeriodicalId":229746,"journal":{"name":"Proceedings of the 2020 5th International Conference on Machine Learning Technologies","volume":"77 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-06-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 2020 5th International Conference on Machine Learning Technologies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3409073.3409077","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Linear classification has demonstrated success in many areas of applications. Modern algorithms for linear classification can train reasonably good models while going through the data in only tens of rounds. However, large data often does not fit in the memory of a single machine, which makes the bottleneck in large-scale learning the disk I/O, not the CPU. In this paper, we describe a specific implementation of the Alternating Direction Method of Multipliers (ADMM) algorithm for distributed optimization. This implementation runs logistic regression with L2 regularization over large datasets and does not require a user-tuned learning rate meta-parameter or any tools beyond Spark. We implement this framework in Apache Spark and compare it with the widely used Machine Learning LIBrary (MLLIB) in Apache Spark 2.4
基于Spark的ADMM高效L2正则化逻辑回归
线性分类在许多领域的应用都取得了成功。现代的线性分类算法可以训练出相当好的模型,而只需要几十轮的数据。然而,大数据通常不适合单个机器的内存,这使得大规模学习的瓶颈是磁盘I/O,而不是CPU。在本文中,我们描述了用于分布式优化的乘法器交替方向法(ADMM)算法的具体实现。该实现在大型数据集上运行具有L2正则化的逻辑回归,并且不需要用户调整学习率元参数或Spark以外的任何工具。我们在Apache Spark中实现了这个框架,并将其与Apache Spark 2.4中广泛使用的机器学习库(MLLIB)进行了比较
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信