Efficient Logistic Regression with L2 Regularization using ADMM on Spark

Proceedings of the 2020 5th International Conference on Machine Learning Technologies Pub Date : 2020-06-19 DOI:10.1145/3409073.3409077

Xiao Su

引用次数: 0

Abstract

Linear classification has demonstrated success in many areas of applications. Modern algorithms for linear classification can train reasonably good models while going through the data in only tens of rounds. However, large data often does not fit in the memory of a single machine, which makes the bottleneck in large-scale learning the disk I/O, not the CPU. In this paper, we describe a specific implementation of the Alternating Direction Method of Multipliers (ADMM) algorithm for distributed optimization. This implementation runs logistic regression with L2 regularization over large datasets and does not require a user-tuned learning rate meta-parameter or any tools beyond Spark. We implement this framework in Apache Spark and compare it with the widely used Machine Learning LIBrary (MLLIB) in Apache Spark 2.4

查看原文本刊更多论文

基于Spark的ADMM高效L2正则化逻辑回归

线性分类在许多领域的应用都取得了成功。现代的线性分类算法可以训练出相当好的模型，而只需要几十轮的数据。然而，大数据通常不适合单个机器的内存，这使得大规模学习的瓶颈是磁盘I/O，而不是CPU。在本文中，我们描述了用于分布式优化的乘法器交替方向法(ADMM)算法的具体实现。该实现在大型数据集上运行具有L2正则化的逻辑回归，并且不需要用户调整学习率元参数或Spark以外的任何工具。我们在Apache Spark中实现了这个框架，并将其与Apache Spark 2.4中广泛使用的机器学习库(MLLIB)进行了比较

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 2020 5th International Conference on Machine Learning Technologies

自引率

0.00%

发文量