A Comprehensive Survey on Distributed Training of Graph Neural Networks

IF 23.2 1区 计算机科学 Q1 ENGINEERING, ELECTRICAL & ELECTRONIC
Haiyang Lin;Mingyu Yan;Xiaochun Ye;Dongrui Fan;Shirui Pan;Wenguang Chen;Yuan Xie
{"title":"A Comprehensive Survey on Distributed Training of Graph Neural Networks","authors":"Haiyang Lin;Mingyu Yan;Xiaochun Ye;Dongrui Fan;Shirui Pan;Wenguang Chen;Yuan Xie","doi":"10.1109/JPROC.2023.3337442","DOIUrl":null,"url":null,"abstract":"Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training that distributes the workload of training across multiple computing nodes. At present, the volume of related research on distributed GNN training is exceptionally vast, accompanied by an extraordinarily rapid pace of publication. Moreover, the approaches reported in these studies exhibit significant divergence. This situation poses a considerable challenge for newcomers, hindering their ability to grasp a comprehensive understanding of the workflows, computational patterns, communication strategies, and optimization techniques employed in distributed GNN training. As a result, there is a pressing need for a survey to provide correct recognition, analysis, and comparisons in this field. In this article, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work, are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks (DNNs), emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.","PeriodicalId":20556,"journal":{"name":"Proceedings of the IEEE","volume":null,"pages":null},"PeriodicalIF":23.2000,"publicationDate":"2023-12-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the IEEE","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10348966/","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"ENGINEERING, ELECTRICAL & ELECTRONIC","Score":null,"Total":0}
引用次数: 0

Abstract

Graph neural networks (GNNs) have been demonstrated to be a powerful algorithmic model in broad application fields for their effectiveness in learning over graphs. To scale GNN training up for large-scale and ever-growing graphs, the most promising solution is distributed training that distributes the workload of training across multiple computing nodes. At present, the volume of related research on distributed GNN training is exceptionally vast, accompanied by an extraordinarily rapid pace of publication. Moreover, the approaches reported in these studies exhibit significant divergence. This situation poses a considerable challenge for newcomers, hindering their ability to grasp a comprehensive understanding of the workflows, computational patterns, communication strategies, and optimization techniques employed in distributed GNN training. As a result, there is a pressing need for a survey to provide correct recognition, analysis, and comparisons in this field. In this article, we provide a comprehensive survey of distributed GNN training by investigating various optimization techniques used in distributed GNN training. First, distributed GNN training is classified into several categories according to their workflows. In addition, their computational patterns and communication patterns, as well as the optimization techniques proposed by recent work, are introduced. Second, the software frameworks and hardware platforms of distributed GNN training are also introduced for a deeper understanding. Third, distributed GNN training is compared with distributed training of deep neural networks (DNNs), emphasizing the uniqueness of distributed GNN training. Finally, interesting issues and opportunities in this field are discussed.
图神经网络分布式训练综合调查
图神经网络(GNN)因其在图上学习的有效性,已在广泛的应用领域被证明是一种强大的算法模型。为了扩大图神经网络的训练规模,以适应大规模和不断增长的图,最有前途的解决方案是分布式训练,即在多个计算节点上分配训练工作量。目前,有关分布式 GNN 训练的相关研究数量异常庞大,发表论文的速度也异常迅猛。此外,这些研究中报告的方法也呈现出明显的差异。这种情况给新手带来了相当大的挑战,阻碍了他们全面了解分布式 GNN 训练中采用的工作流程、计算模式、通信策略和优化技术。因此,该领域迫切需要一份调查报告来提供正确的认识、分析和比较。本文通过研究分布式 GNN 训练中使用的各种优化技术,对分布式 GNN 训练进行了全面调查。首先,分布式 GNN 训练根据其工作流程分为几类。此外,还介绍了它们的计算模式和通信模式,以及近期工作中提出的优化技术。其次,为了加深理解,还介绍了分布式 GNN 训练的软件框架和硬件平台。第三,将分布式 GNN 训练与深度神经网络(DNN)的分布式训练进行了比较,强调了分布式 GNN 训练的独特性。最后,讨论了这一领域的有趣问题和机遇。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Proceedings of the IEEE
Proceedings of the IEEE 工程技术-工程:电子与电气
CiteScore
46.40
自引率
1.00%
发文量
160
审稿时长
3-8 weeks
期刊介绍: Proceedings of the IEEE is the leading journal to provide in-depth review, survey, and tutorial coverage of the technical developments in electronics, electrical and computer engineering, and computer science. Consistently ranked as one of the top journals by Impact Factor, Article Influence Score and more, the journal serves as a trusted resource for engineers around the world.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信