分散边缘学习：蒸馏策略与不相似度量的比较研究

IF 6.2 2区计算机科学 Q1 COMPUTER SCIENCE, THEORY & METHODS

Future Generation Computer Systems-The International Journal of Escience Pub Date : 2025-10-01 DOI:10.1016/j.future.2025.108171

Mbasa Joaquim Molo , Lucia Vadicamo , Claudio Gennaro , Emanuele Carlini

{"title":"分散边缘学习：蒸馏策略与不相似度量的比较研究","authors":"Mbasa Joaquim Molo , Lucia Vadicamo , Claudio Gennaro , Emanuele Carlini","doi":"10.1016/j.future.2025.108171","DOIUrl":null,"url":null,"abstract":"<div><div>Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client’s output and each neighbor’s prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients.</div><div>We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.</div></div>","PeriodicalId":55132,"journal":{"name":"Future Generation Computer Systems-The International Journal of Escience","volume":"176 ","pages":"Article 108171"},"PeriodicalIF":6.2000,"publicationDate":"2025-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Decentralized edge learning: A comparative study of distillation strategies and dissimilarity measures\",\"authors\":\"Mbasa Joaquim Molo , Lucia Vadicamo , Claudio Gennaro , Emanuele Carlini\",\"doi\":\"10.1016/j.future.2025.108171\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client’s output and each neighbor’s prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients.</div><div>We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.</div></div>\",\"PeriodicalId\":55132,\"journal\":{\"name\":\"Future Generation Computer Systems-The International Journal of Escience\",\"volume\":\"176 \",\"pages\":\"Article 108171\"},\"PeriodicalIF\":6.2000,\"publicationDate\":\"2025-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Future Generation Computer Systems-The International Journal of Escience\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0167739X25004650\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, THEORY & METHODS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Future Generation Computer Systems-The International Journal of Escience","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0167739X25004650","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, THEORY & METHODS","Score":null,"Total":0}

引用次数: 0

摘要

分散式学习正在成为集中式机器学习的可扩展和隐私保护替代方案，特别是在数据无法在多个节点或客户端之间集中共享的分布式系统中。虽然联邦学习在这种情况下被广泛采用，但知识蒸馏（Knowledge Distillation， KD）正在成为一种灵活且可扩展的替代方案，其中模型输出用于在分布式客户端之间共享知识。然而，现有的研究往往忽视了KD中各种知识转移策略的效率和有效性，特别是在数据是非iid的分散环境中。本研究通过检查网络拓扑和蒸馏策略对基于kd的分散学习方法的影响，提供了关键的见解。我们的评估跨越了几个不同的度量，包括交叉熵、Kullback-Leibler散度、三角散度、Jensen-Shannon散度、结构熵距离和多向SED，在两两和整体蒸馏方案下进行评估。在两两方法中，通过将客户端输出与每个邻居的预测之间的客户端差异分别相加来执行蒸馏，而整体方法则根据从相邻客户端接收到的输出预测的平均值计算不相似性。我们还分析了跨客户端连接级别的性能，以探索收敛速度和模型准确性之间的权衡。结果表明，整体蒸馏方法，即平均客户预测，优于两两蒸馏的总和，特别是在采用TD， SED和JS等替代措施时。这些措施提供了优于传统指标（如CE和KL散度）的性能。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Decentralized edge learning: A comparative study of distillation strategies and dissimilarity measures

Decentralized learning is emerging as a scalable and privacy-preserving alternative to centralized machine learning, particularly in distributed systems where data cannot be centrally shared among multiple nodes or clients. While Federated Learning is widely adopted in this context, Knowledge Distillation (KD) is emerging as a flexible and scalable alternative where model output is used to share knowledge among distributed clients. However, existing studies often overlook the efficiency and effectiveness of various knowledge transfer strategies in KD, especially in decentralized environments where data is non-IID. This study provides key insights by examining the impact of network topology and distillation strategies in KD-based decentralized learning approaches. Our evaluation spans several dissimilarity measures, including Cross-Entropy, Kullback-Leibler divergence, Triangular Divergence, Jensen-Shannon divergence, Structural Entropic Distance, and Multi-way SED, assessed under both pairwise and holistic distillation schemes. In the pairwise approach, distillation is performed by summing the client-wise dissimilarities between a client’s output and each neighbor’s prediction individually, while the holistic approach computes dissimilarity with respect to the average of the output predictions received from neighboring clients.

We also analyze performance across client connectivity levels to explore the trade-off between convergence speed and model accuracy. The results indicate that the holistic distillation approach, which averages client predictions, outperforms the sum of pairwise distillation, especially when employing alternative measures like TD, SED, and JS. These measures offer improved performance over conventional metrics such as CE and KL divergence.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Future Generation Computer Systems-The International Journal of Escience 工程技术-计算机：理论方法

CiteScore

19.90

自引率

2.70%

发文量

376

审稿时长

10.6 months

期刊介绍： Computing infrastructures and systems are constantly evolving, resulting in increasingly complex and collaborative scientific applications. To cope with these advancements, there is a growing need for collaborative tools that can effectively map, control, and execute these applications. Furthermore, with the explosion of Big Data, there is a requirement for innovative methods and infrastructures to collect, analyze, and derive meaningful insights from the vast amount of data generated. This necessitates the integration of computational and storage capabilities, databases, sensors, and human collaboration. Future Generation Computer Systems aims to pioneer advancements in distributed systems, collaborative environments, high-performance computing, and Big Data analytics. It strives to stay at the forefront of developments in grids, clouds, and the Internet of Things (IoT) to effectively address the challenges posed by these wide-area, fully distributed sensing and computing systems.