Natural Gradient Primal-Dual Method for Decentralized Learning

IF 3 3区 计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC
Kenta Niwa;Hiro Ishii;Hiroshi Sawada;Akinori Fujino;Noboru Harada;Rio Yokota
{"title":"Natural Gradient Primal-Dual Method for Decentralized Learning","authors":"Kenta Niwa;Hiro Ishii;Hiroshi Sawada;Akinori Fujino;Noboru Harada;Rio Yokota","doi":"10.1109/TSIPN.2024.3388948","DOIUrl":null,"url":null,"abstract":"We propose the Natural Gradient Primal-Dual (NGPD) method for decentralized learning of parameters in Deep Neural Networks (DNNs). Conventional approaches, such as the primal-dual method, constrain the local parameters to be similar between connected nodes. However, since most of them follow a first-order optimization method and the loss functions of DNNs may have ill-conditioned curvatures, many local parameter updates and communication among local nodes are needed. For fast convergence, we integrate the second-order natural gradient method into the primal-dual method (NGPD). Since additional constraint minimizes the amount of output change before and after the parameter updates, robustness towards ill-conditioned curvatures is expected. We theoretically demonstrate the convergence rate for the averaged parameter (the average of the local parameters) under certain assumptions. As a practical implementation of NGPD without a significant increase in computational overheads, we introduce Kronecker Factored Approximate Curvature (K-FAC). Our experimental results confirmed that NGPD achieved the highest test accuracy through image classification tasks using DNNs.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"10 ","pages":"417-433"},"PeriodicalIF":3.0000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10509010/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

We propose the Natural Gradient Primal-Dual (NGPD) method for decentralized learning of parameters in Deep Neural Networks (DNNs). Conventional approaches, such as the primal-dual method, constrain the local parameters to be similar between connected nodes. However, since most of them follow a first-order optimization method and the loss functions of DNNs may have ill-conditioned curvatures, many local parameter updates and communication among local nodes are needed. For fast convergence, we integrate the second-order natural gradient method into the primal-dual method (NGPD). Since additional constraint minimizes the amount of output change before and after the parameter updates, robustness towards ill-conditioned curvatures is expected. We theoretically demonstrate the convergence rate for the averaged parameter (the average of the local parameters) under certain assumptions. As a practical implementation of NGPD without a significant increase in computational overheads, we introduce Kronecker Factored Approximate Curvature (K-FAC). Our experimental results confirmed that NGPD achieved the highest test accuracy through image classification tasks using DNNs.
用于分散学习的自然梯度原始双法
我们提出了自然梯度原始双法(NGPD),用于深度神经网络(DNN)中参数的分散学习。传统方法(如原始-双重方法)会限制连接节点之间的局部参数相似。然而,由于这些方法大多采用一阶优化法,而 DNN 的损失函数可能具有条件不佳的曲率,因此需要在局部节点之间进行多次局部参数更新和通信。为了实现快速收敛,我们将二阶自然梯度法整合到了初阶-二阶法(NGPD)中。由于额外的约束条件使参数更新前后的输出变化量最小,因此对条件不佳的曲率具有鲁棒性。我们从理论上证明了在某些假设条件下平均参数(局部参数的平均值)的收敛速度。在不显著增加计算开销的情况下,我们引入了 Kronecker 因子近似曲率(K-FAC),作为 NGPD 的实际应用。我们的实验结果证实,NGPD 在使用 DNN 的图像分类任务中取得了最高的测试精度。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Signal and Information Processing over Networks
IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications
CiteScore
5.80
自引率
12.50%
发文量
56
期刊介绍: The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信