分布式正则化线性回归模型的统一算法

IF 4.4 2区 数学 Q1 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Bingzhen Chen , Wenjuan Zhai
{"title":"分布式正则化线性回归模型的统一算法","authors":"Bingzhen Chen ,&nbsp;Wenjuan Zhai","doi":"10.1016/j.matcom.2024.10.018","DOIUrl":null,"url":null,"abstract":"<div><div>In recent years, distributed statistical models have received increasing attention for large-scale data analysis. On the one hand, data sets come from multiple data sources, and are stored in different locations due to limited bandwidth and storage, or privacy protocols, directly centralizing all data together is impossible. On the other hand, the size of data is so large that it is difficult or inefficient to analyze data together. There are two main research aspects to using distributed statistical models to analyze large-scale data. The first one is to study the statistical convergence rate under some mild assumptions. The second one is to establish fast and efficient optimization algorithms considering the property of the loss function. There is a lot of research on the first aspect, but relatively little research on the second one. Motivated by this, we consider the construction of unified algorithms for distributed linear regression with different losses and regularizers. As a result, we designed two type methods, proximal alternating direction method of multipliers (pADMM) and distributed accelerated proximal gradient method with line-search (DAPGL). In order to demonstrate the efficiency of the proposed algorithms, we perform numerical experiments on the distributed Huber-Lasso model and Huber-Group-Lasso model. In view of the numerical results, we can observe that these two algorithms are more competitive than some of state-of-art algorithms. In particular, DAPGL algorithm performs better than pADMM in most cases.</div></div>","PeriodicalId":49856,"journal":{"name":"Mathematics and Computers in Simulation","volume":"229 ","pages":"Pages 867-884"},"PeriodicalIF":4.4000,"publicationDate":"2024-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Unified algorithms for distributed regularized linear regression model\",\"authors\":\"Bingzhen Chen ,&nbsp;Wenjuan Zhai\",\"doi\":\"10.1016/j.matcom.2024.10.018\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>In recent years, distributed statistical models have received increasing attention for large-scale data analysis. On the one hand, data sets come from multiple data sources, and are stored in different locations due to limited bandwidth and storage, or privacy protocols, directly centralizing all data together is impossible. On the other hand, the size of data is so large that it is difficult or inefficient to analyze data together. There are two main research aspects to using distributed statistical models to analyze large-scale data. The first one is to study the statistical convergence rate under some mild assumptions. The second one is to establish fast and efficient optimization algorithms considering the property of the loss function. There is a lot of research on the first aspect, but relatively little research on the second one. Motivated by this, we consider the construction of unified algorithms for distributed linear regression with different losses and regularizers. As a result, we designed two type methods, proximal alternating direction method of multipliers (pADMM) and distributed accelerated proximal gradient method with line-search (DAPGL). In order to demonstrate the efficiency of the proposed algorithms, we perform numerical experiments on the distributed Huber-Lasso model and Huber-Group-Lasso model. In view of the numerical results, we can observe that these two algorithms are more competitive than some of state-of-art algorithms. In particular, DAPGL algorithm performs better than pADMM in most cases.</div></div>\",\"PeriodicalId\":49856,\"journal\":{\"name\":\"Mathematics and Computers in Simulation\",\"volume\":\"229 \",\"pages\":\"Pages 867-884\"},\"PeriodicalIF\":4.4000,\"publicationDate\":\"2024-11-07\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Mathematics and Computers in Simulation\",\"FirstCategoryId\":\"100\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0378475424004063\",\"RegionNum\":2,\"RegionCategory\":\"数学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematics and Computers in Simulation","FirstCategoryId":"100","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0378475424004063","RegionNum":2,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

摘要

近年来,分布式统计模型在大规模数据分析中受到越来越多的关注。一方面,数据集来自多个数据源,由于带宽和存储有限或隐私协议等原因,数据存储在不同地点,直接将所有数据集中在一起是不可能的。另一方面,由于数据量太大,将数据集中在一起进行分析非常困难或效率低下。使用分布式统计模型分析大规模数据主要有两个研究方面。第一是研究在一些温和假设下的统计收敛率。其次是考虑损失函数的特性,建立快速高效的优化算法。关于第一个方面的研究很多,但关于第二个方面的研究相对较少。受此启发,我们考虑构建具有不同损失和正则的分布式线性回归统一算法。因此,我们设计了两种方法,即近端交替乘法(pADMM)和分布式加速近端梯度法(DAPGL)。为了证明所提算法的效率,我们对分布式 Huber-Lasso 模型和 Huber-Group-Lasso 模型进行了数值实验。根据数值结果,我们可以发现这两种算法比一些最先进的算法更具竞争力。特别是,在大多数情况下,DAPGL 算法的性能都优于 pADMM。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Unified algorithms for distributed regularized linear regression model
In recent years, distributed statistical models have received increasing attention for large-scale data analysis. On the one hand, data sets come from multiple data sources, and are stored in different locations due to limited bandwidth and storage, or privacy protocols, directly centralizing all data together is impossible. On the other hand, the size of data is so large that it is difficult or inefficient to analyze data together. There are two main research aspects to using distributed statistical models to analyze large-scale data. The first one is to study the statistical convergence rate under some mild assumptions. The second one is to establish fast and efficient optimization algorithms considering the property of the loss function. There is a lot of research on the first aspect, but relatively little research on the second one. Motivated by this, we consider the construction of unified algorithms for distributed linear regression with different losses and regularizers. As a result, we designed two type methods, proximal alternating direction method of multipliers (pADMM) and distributed accelerated proximal gradient method with line-search (DAPGL). In order to demonstrate the efficiency of the proposed algorithms, we perform numerical experiments on the distributed Huber-Lasso model and Huber-Group-Lasso model. In view of the numerical results, we can observe that these two algorithms are more competitive than some of state-of-art algorithms. In particular, DAPGL algorithm performs better than pADMM in most cases.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Mathematics and Computers in Simulation
Mathematics and Computers in Simulation 数学-计算机:跨学科应用
CiteScore
8.90
自引率
4.30%
发文量
335
审稿时长
54 days
期刊介绍: The aim of the journal is to provide an international forum for the dissemination of up-to-date information in the fields of the mathematics and computers, in particular (but not exclusively) as they apply to the dynamics of systems, their simulation and scientific computation in general. Published material ranges from short, concise research papers to more general tutorial articles. Mathematics and Computers in Simulation, published monthly, is the official organ of IMACS, the International Association for Mathematics and Computers in Simulation (Formerly AICA). This Association, founded in 1955 and legally incorporated in 1956 is a member of FIACC (the Five International Associations Coordinating Committee), together with IFIP, IFAV, IFORS and IMEKO. Topics covered by the journal include mathematical tools in: •The foundations of systems modelling •Numerical analysis and the development of algorithms for simulation They also include considerations about computer hardware for simulation and about special software and compilers. The journal also publishes articles concerned with specific applications of modelling and simulation in science and engineering, with relevant applied mathematics, the general philosophy of systems simulation, and their impact on disciplinary and interdisciplinary research. The journal includes a Book Review section -- and a "News on IMACS" section that contains a Calendar of future Conferences/Events and other information about the Association.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信