{"title":"用于分散学习的自然梯度原始双法","authors":"Kenta Niwa;Hiro Ishii;Hiroshi Sawada;Akinori Fujino;Noboru Harada;Rio Yokota","doi":"10.1109/TSIPN.2024.3388948","DOIUrl":null,"url":null,"abstract":"We propose the Natural Gradient Primal-Dual (NGPD) method for decentralized learning of parameters in Deep Neural Networks (DNNs). Conventional approaches, such as the primal-dual method, constrain the local parameters to be similar between connected nodes. However, since most of them follow a first-order optimization method and the loss functions of DNNs may have ill-conditioned curvatures, many local parameter updates and communication among local nodes are needed. For fast convergence, we integrate the second-order natural gradient method into the primal-dual method (NGPD). Since additional constraint minimizes the amount of output change before and after the parameter updates, robustness towards ill-conditioned curvatures is expected. We theoretically demonstrate the convergence rate for the averaged parameter (the average of the local parameters) under certain assumptions. As a practical implementation of NGPD without a significant increase in computational overheads, we introduce Kronecker Factored Approximate Curvature (K-FAC). Our experimental results confirmed that NGPD achieved the highest test accuracy through image classification tasks using DNNs.","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"10 ","pages":"417-433"},"PeriodicalIF":3.0000,"publicationDate":"2024-04-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Natural Gradient Primal-Dual Method for Decentralized Learning\",\"authors\":\"Kenta Niwa;Hiro Ishii;Hiroshi Sawada;Akinori Fujino;Noboru Harada;Rio Yokota\",\"doi\":\"10.1109/TSIPN.2024.3388948\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We propose the Natural Gradient Primal-Dual (NGPD) method for decentralized learning of parameters in Deep Neural Networks (DNNs). Conventional approaches, such as the primal-dual method, constrain the local parameters to be similar between connected nodes. However, since most of them follow a first-order optimization method and the loss functions of DNNs may have ill-conditioned curvatures, many local parameter updates and communication among local nodes are needed. For fast convergence, we integrate the second-order natural gradient method into the primal-dual method (NGPD). Since additional constraint minimizes the amount of output change before and after the parameter updates, robustness towards ill-conditioned curvatures is expected. We theoretically demonstrate the convergence rate for the averaged parameter (the average of the local parameters) under certain assumptions. As a practical implementation of NGPD without a significant increase in computational overheads, we introduce Kronecker Factored Approximate Curvature (K-FAC). Our experimental results confirmed that NGPD achieved the highest test accuracy through image classification tasks using DNNs.\",\"PeriodicalId\":56268,\"journal\":{\"name\":\"IEEE Transactions on Signal and Information Processing over Networks\",\"volume\":\"10 \",\"pages\":\"417-433\"},\"PeriodicalIF\":3.0000,\"publicationDate\":\"2024-04-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Signal and Information Processing over Networks\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10509010/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"ENGINEERING, ELECTRICAL & ELECTRONIC\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10509010/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
Natural Gradient Primal-Dual Method for Decentralized Learning
We propose the Natural Gradient Primal-Dual (NGPD) method for decentralized learning of parameters in Deep Neural Networks (DNNs). Conventional approaches, such as the primal-dual method, constrain the local parameters to be similar between connected nodes. However, since most of them follow a first-order optimization method and the loss functions of DNNs may have ill-conditioned curvatures, many local parameter updates and communication among local nodes are needed. For fast convergence, we integrate the second-order natural gradient method into the primal-dual method (NGPD). Since additional constraint minimizes the amount of output change before and after the parameter updates, robustness towards ill-conditioned curvatures is expected. We theoretically demonstrate the convergence rate for the averaged parameter (the average of the local parameters) under certain assumptions. As a practical implementation of NGPD without a significant increase in computational overheads, we introduce Kronecker Factored Approximate Curvature (K-FAC). Our experimental results confirmed that NGPD achieved the highest test accuracy through image classification tasks using DNNs.
期刊介绍:
The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.