分布式正则化线性回归模型的统一算法

IF 4.4 2区数学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS

Mathematics and Computers in Simulation Pub Date : 2024-11-07 DOI:10.1016/j.matcom.2024.10.018

Bingzhen Chen , Wenjuan Zhai

{"title":"分布式正则化线性回归模型的统一算法","authors":"Bingzhen Chen , Wenjuan Zhai","doi":"10.1016/j.matcom.2024.10.018","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, distributed statistical models have received increasing attention for large-scale data analysis. On the one hand, data sets come from multiple data sources, and are stored in different locations due to limited bandwidth and storage, or privacy protocols, directly centralizing all data together is impossible. On the other hand, the size of data is so large that it is difficult or inefficient to analyze data together. There are two main research aspects to using distributed statistical models to analyze large-scale data. The first one is to study the statistical convergence rate under some mild assumptions. The second one is to establish fast and efficient optimization algorithms considering the property of the loss function. There is a lot of research on the first aspect, but relatively little research on the second one. Motivated by this, we consider the construction of unified algorithms for distributed linear regression with different losses and regularizers. As a result, we designed two type methods, proximal alternating direction method of multipliers (pADMM) and distributed accelerated proximal gradient method with line-search (DAPGL). In order to demonstrate the efficiency of the proposed algorithms, we perform numerical experiments on the distributed Huber-Lasso model and Huber-Group-Lasso model. In view of the numerical results, we can observe that these two algorithms are more competitive than some of state-of-art algorithms. In particular, DAPGL algorithm performs better than pADMM in most cases.</div></div>","PeriodicalId":49856,"journal":{"name":"Mathematics and Computers in Simulation","volume":"229 ","pages":"Pages 867-884"},"PeriodicalIF":4.4000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unified algorithms for distributed regularized linear regression model\",\"authors\":\"Bingzhen Chen , Wenjuan Zhai\",\"doi\":\"10.1016/j.matcom.2024.10.018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, distributed statistical models have received increasing attention for large-scale data analysis. On the one hand, data sets come from multiple data sources, and are stored in different locations due to limited bandwidth and storage, or privacy protocols, directly centralizing all data together is impossible. On the other hand, the size of data is so large that it is difficult or inefficient to analyze data together. There are two main research aspects to using distributed statistical models to analyze large-scale data. The first one is to study the statistical convergence rate under some mild assumptions. The second one is to establish fast and efficient optimization algorithms considering the property of the loss function. There is a lot of research on the first aspect, but relatively little research on the second one. Motivated by this, we consider the construction of unified algorithms for distributed linear regression with different losses and regularizers. As a result, we designed two type methods, proximal alternating direction method of multipliers (pADMM) and distributed accelerated proximal gradient method with line-search (DAPGL). In order to demonstrate the efficiency of the proposed algorithms, we perform numerical experiments on the distributed Huber-Lasso model and Huber-Group-Lasso model. In view of the numerical results, we can observe that these two algorithms are more competitive than some of state-of-art algorithms. In particular, DAPGL algorithm performs better than pADMM in most cases.</div></div>\",\"PeriodicalId\":49856,\"journal\":{\"name\":\"Mathematics and Computers in Simulation\",\"volume\":\"229 \",\"pages\":\"Pages 867-884\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematics and Computers in Simulation\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378475424004063\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics and Computers in Simulation","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378475424004063","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}

引用次数: 0

摘要

近年来，分布式统计模型在大规模数据分析中受到越来越多的关注。一方面，数据集来自多个数据源，由于带宽和存储有限或隐私协议等原因，数据存储在不同地点，直接将所有数据集中在一起是不可能的。另一方面，由于数据量太大，将数据集中在一起进行分析非常困难或效率低下。使用分布式统计模型分析大规模数据主要有两个研究方面。第一是研究在一些温和假设下的统计收敛率。其次是考虑损失函数的特性，建立快速高效的优化算法。关于第一个方面的研究很多，但关于第二个方面的研究相对较少。受此启发，我们考虑构建具有不同损失和正则的分布式线性回归统一算法。因此，我们设计了两种方法，即近端交替乘法（pADMM）和分布式加速近端梯度法（DAPGL）。为了证明所提算法的效率，我们对分布式 Huber-Lasso 模型和 Huber-Group-Lasso 模型进行了数值实验。根据数值结果，我们可以发现这两种算法比一些最先进的算法更具竞争力。特别是，在大多数情况下，DAPGL 算法的性能都优于 pADMM。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unified algorithms for distributed regularized linear regression model

In recent years, distributed statistical models have received increasing attention for large-scale data analysis. On the one hand, data sets come from multiple data sources, and are stored in different locations due to limited bandwidth and storage, or privacy protocols, directly centralizing all data together is impossible. On the other hand, the size of data is so large that it is difficult or inefficient to analyze data together. There are two main research aspects to using distributed statistical models to analyze large-scale data. The first one is to study the statistical convergence rate under some mild assumptions. The second one is to establish fast and efficient optimization algorithms considering the property of the loss function. There is a lot of research on the first aspect, but relatively little research on the second one. Motivated by this, we consider the construction of unified algorithms for distributed linear regression with different losses and regularizers. As a result, we designed two type methods, proximal alternating direction method of multipliers (pADMM) and distributed accelerated proximal gradient method with line-search (DAPGL). In order to demonstrate the efficiency of the proposed algorithms, we perform numerical experiments on the distributed Huber-Lasso model and Huber-Group-Lasso model. In view of the numerical results, we can observe that these two algorithms are more competitive than some of state-of-art algorithms. In particular, DAPGL algorithm performs better than pADMM in most cases.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Mathematics and Computers in Simulation 数学-计算机：跨学科应用

CiteScore

8.90

自引率

4.30%

发文量

335

审稿时长

54 days

期刊介绍： The aim of the journal is to provide an international forum for the dissemination of up-to-date information in the fields of the mathematics and computers, in particular (but not exclusively) as they apply to the dynamics of systems, their simulation and scientific computation in general. Published material ranges from short, concise research papers to more general tutorial articles. Mathematics and Computers in Simulation, published monthly, is the official organ of IMACS, the International Association for Mathematics and Computers in Simulation (Formerly AICA). This Association, founded in 1955 and legally incorporated in 1956 is a member of FIACC (the Five International Associations Coordinating Committee), together with IFIP, IFAV, IFORS and IMEKO. Topics covered by the journal include mathematical tools in: •The foundations of systems modelling •Numerical analysis and the development of algorithms for simulation They also include considerations about computer hardware for simulation and about special software and compilers. The journal also publishes articles concerned with specific applications of modelling and simulation in science and engineering, with relevant applied mathematics, the general philosophy of systems simulation, and their impact on disciplinary and interdisciplinary research. The journal includes a Book Review section -- and a "News on IMACS" section that contains a Calendar of future Conferences/Events and other information about the Association.