{"title":"Unified algorithms for distributed regularized linear regression model","authors":"Bingzhen Chen , Wenjuan Zhai","doi":"10.1016/j.matcom.2024.10.018","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, distributed statistical models have received increasing attention for large-scale data analysis. On the one hand, data sets come from multiple data sources, and are stored in different locations due to limited bandwidth and storage, or privacy protocols, directly centralizing all data together is impossible. On the other hand, the size of data is so large that it is difficult or inefficient to analyze data together. There are two main research aspects to using distributed statistical models to analyze large-scale data. The first one is to study the statistical convergence rate under some mild assumptions. The second one is to establish fast and efficient optimization algorithms considering the property of the loss function. There is a lot of research on the first aspect, but relatively little research on the second one. Motivated by this, we consider the construction of unified algorithms for distributed linear regression with different losses and regularizers. As a result, we designed two type methods, proximal alternating direction method of multipliers (pADMM) and distributed accelerated proximal gradient method with line-search (DAPGL). In order to demonstrate the efficiency of the proposed algorithms, we perform numerical experiments on the distributed Huber-Lasso model and Huber-Group-Lasso model. In view of the numerical results, we can observe that these two algorithms are more competitive than some of state-of-art algorithms. In particular, DAPGL algorithm performs better than pADMM in most cases.</div></div>","PeriodicalId":49856,"journal":{"name":"Mathematics and Computers in Simulation","volume":"229 ","pages":"Pages 867-884"},"PeriodicalIF":4.4000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics and Computers in Simulation","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378475424004063","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0
Abstract
In recent years, distributed statistical models have received increasing attention for large-scale data analysis. On the one hand, data sets come from multiple data sources, and are stored in different locations due to limited bandwidth and storage, or privacy protocols, directly centralizing all data together is impossible. On the other hand, the size of data is so large that it is difficult or inefficient to analyze data together. There are two main research aspects to using distributed statistical models to analyze large-scale data. The first one is to study the statistical convergence rate under some mild assumptions. The second one is to establish fast and efficient optimization algorithms considering the property of the loss function. There is a lot of research on the first aspect, but relatively little research on the second one. Motivated by this, we consider the construction of unified algorithms for distributed linear regression with different losses and regularizers. As a result, we designed two type methods, proximal alternating direction method of multipliers (pADMM) and distributed accelerated proximal gradient method with line-search (DAPGL). In order to demonstrate the efficiency of the proposed algorithms, we perform numerical experiments on the distributed Huber-Lasso model and Huber-Group-Lasso model. In view of the numerical results, we can observe that these two algorithms are more competitive than some of state-of-art algorithms. In particular, DAPGL algorithm performs better than pADMM in most cases.
期刊介绍:
The aim of the journal is to provide an international forum for the dissemination of up-to-date information in the fields of the mathematics and computers, in particular (but not exclusively) as they apply to the dynamics of systems, their simulation and scientific computation in general. Published material ranges from short, concise research papers to more general tutorial articles.
Mathematics and Computers in Simulation, published monthly, is the official organ of IMACS, the International Association for Mathematics and Computers in Simulation (Formerly AICA). This Association, founded in 1955 and legally incorporated in 1956 is a member of FIACC (the Five International Associations Coordinating Committee), together with IFIP, IFAV, IFORS and IMEKO.
Topics covered by the journal include mathematical tools in:
•The foundations of systems modelling
•Numerical analysis and the development of algorithms for simulation
They also include considerations about computer hardware for simulation and about special software and compilers.
The journal also publishes articles concerned with specific applications of modelling and simulation in science and engineering, with relevant applied mathematics, the general philosophy of systems simulation, and their impact on disciplinary and interdisciplinary research.
The journal includes a Book Review section -- and a "News on IMACS" section that contains a Calendar of future Conferences/Events and other information about the Association.