Distributed Training of Large Graph Neural Networks With Variable Communication Rates

IF 3 3区计算机科学 Q2 ENGINEERING, ELECTRICAL & ELECTRONIC

IEEE Transactions on Signal and Information Processing over Networks Pub Date : 2025-04-03 DOI:10.1109/TSIPN.2025.3557584

Juan Cerviño;Md Asadullah Turja;Hesham Mostafa;Nageen Himayat;Alejandro Ribeiro

引用次数: 0

Abstract

Training Graph Neural Networks (GNNs) on large graphs presents unique challenges due to the large memory and computing requirements. Distributed GNN training, where the graph is partitioned across multiple machines, is a common approach to training GNNs on large graphs. However, as the graph cannot generally be decomposed into small non-interacting components, data communication between the training machines quickly limits training speeds. Compressing the communicated node activations by a fixed amount improves the training speeds, but lowers the accuracy of the trained GNN. In this paper, we introduce a variable compression scheme for reducing the communication volume in distributed GNN training without compromising the accuracy of the learned model.

查看原文本刊更多论文

变通信速率大图神经网络的分布式训练

在大图形上训练图神经网络（gnn）由于其巨大的内存和计算需求而面临着独特的挑战。分布式GNN训练是在大型图上训练GNN的一种常用方法，其中图在多台机器上进行分区。然而，由于图通常不能分解成小的不相互作用的组件，训练机器之间的数据通信很快限制了训练速度。压缩固定数量的通信节点激活提高了训练速度，但降低了训练的GNN的准确性。在本文中，我们引入了一种可变压缩方案来减少分布式GNN训练中的通信量，同时又不影响学习模型的准确性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Signal and Information Processing over Networks Computer Science-Computer Networks and Communications

CiteScore

5.80

自引率

12.50%

发文量

期刊介绍： The IEEE Transactions on Signal and Information Processing over Networks publishes high-quality papers that extend the classical notions of processing of signals defined over vector spaces (e.g. time and space) to processing of signals and information (data) defined over networks, potentially dynamically varying. In signal processing over networks, the topology of the network may define structural relationships in the data, or may constrain processing of the data. Topics include distributed algorithms for filtering, detection, estimation, adaptation and learning, model selection, data fusion, and diffusion or evolution of information over such networks, and applications of distributed signal processing.