Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space

IF 3 3区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal and Information Processing over Networks Pub Date : 2022-09-09 DOI:10.1109/TSIPN.2022.3205549

Akihito Taya;Takayuki Nishio;Masahiro Morikura;Koji Yamamoto

{"title":"Decentralized and Model-Free Federated Learning: Consensus-Based Distillation in Function Space","authors":"Akihito Taya;Takayuki Nishio;Masahiro Morikura;Koji Yamamoto","doi":"10.1109/TSIPN.2022.3205549","DOIUrl":null,"url":null,"abstract":"This paper proposes a fully decentralized federated learning (FL) scheme for Internet of Everything (IoE) devices that are connected via multi-hop networks. Because FL algorithms hardly converge the parameters of machine learning (ML) models, this paper focuses on the convergence of ML models in \n<italic>function spaces</i>\n. Considering that the representative loss functions of ML tasks e.g., mean squared error (MSE) and Kullback-Leibler (KL) divergence, are convex \n<italic>functionals</i>\n, algorithms that directly update functions in function spaces could converge to the optimal solution. The key concept of this paper is to tailor a consensus-based optimization algorithm to work in the function space and achieve the global optimum in a distributed manner. This paper first analyzes the convergence of the proposed algorithm in a function space, which is referred to as a meta-algorithm, and shows that the spectral graph theory can be applied to the function space in a manner similar to that of numerical vectors. Then, consensus-based multi-hop federated distillation (CMFD) is developed for a neural network (NN) to implement the meta-algorithm. CMFD leverages knowledge distillation to realize function aggregation among adjacent devices without parameter averaging. An advantage of CMFD is that it works even with different NN models among the distributed learners. Although CMFD does not perfectly reflect the behavior of the meta-algorithm, the discussion of the meta-algorithm's convergence property promotes an intuitive understanding of CMFD, and simulation evaluations show that NN models converge using CMFD for several tasks. The simulation results also show that CMFD achieves higher accuracy than parameter aggregation for weakly connected networks, and CMFD is more stable than parameter aggregation methods","PeriodicalId":56268,"journal":{"name":"IEEE Transactions on Signal and Information Processing over Networks","volume":"8 ","pages":"799-814"},"PeriodicalIF":3.0000,"publicationDate":"2022-09-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"7","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Signal and Information Processing over Networks","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/9882389/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}

引用次数: 7

Abstract

This paper proposes a fully decentralized federated learning (FL) scheme for Internet of Everything (IoE) devices that are connected via multi-hop networks. Because FL algorithms hardly converge the parameters of machine learning (ML) models, this paper focuses on the convergence of ML models in function spaces . Considering that the representative loss functions of ML tasks e.g., mean squared error (MSE) and Kullback-Leibler (KL) divergence, are convex functionals , algorithms that directly update functions in function spaces could converge to the optimal solution. The key concept of this paper is to tailor a consensus-based optimization algorithm to work in the function space and achieve the global optimum in a distributed manner. This paper first analyzes the convergence of the proposed algorithm in a function space, which is referred to as a meta-algorithm, and shows that the spectral graph theory can be applied to the function space in a manner similar to that of numerical vectors. Then, consensus-based multi-hop federated distillation (CMFD) is developed for a neural network (NN) to implement the meta-algorithm. CMFD leverages knowledge distillation to realize function aggregation among adjacent devices without parameter averaging. An advantage of CMFD is that it works even with different NN models among the distributed learners. Although CMFD does not perfectly reflect the behavior of the meta-algorithm, the discussion of the meta-algorithm's convergence property promotes an intuitive understanding of CMFD, and simulation evaluations show that NN models converge using CMFD for several tasks. The simulation results also show that CMFD achieves higher accuracy than parameter aggregation for weakly connected networks, and CMFD is more stable than parameter aggregation methods

查看原文本刊更多论文

分散和无模型的联邦学习:功能空间中基于共识的蒸馏

本文提出了一种用于通过多跳网络连接的万物互联（IoE）设备的完全去中心化联合学习（FL）方案。由于FL算法很难收敛机器学习（ML）模型的参数，本文重点研究了ML模型在函数空间中的收敛性。考虑到ML任务的代表性损失函数，例如均方误差（MSE）和Kullback-Leibler（KL）散度，是凸泛函，直接更新函数空间中函数的算法可以收敛到最优解。本文的关键概念是定制一种基于一致性的优化算法，使其在函数空间中工作，并以分布式方式实现全局最优。本文首先分析了所提出的算法在函数空间（称为元算法）中的收敛性，并表明谱图理论可以以类似于数值向量的方式应用于函数空间。然后，为神经网络（NN）开发了基于一致性的多跳联合蒸馏（CMFD）来实现元算法。CMFD利用知识提取来实现相邻设备之间的函数聚合，而无需参数平均。CMFD的一个优点是，它甚至可以在分布式学习器中使用不同的神经网络模型。尽管CMFD并不能完美地反映元算法的行为，但对元算法收敛性的讨论促进了对CMFD的直观理解，仿真评估表明，神经网络模型在多个任务中使用CMFD进行收敛。仿真结果还表明，对于弱连接网络，CMFD比参数聚合方法具有更高的精度，并且比参数聚合法更稳定

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications

CiteScore

5.80

自引率

12.50%

发文量

期刊介绍： The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.