Training Large-Scale Graph Neural Networks via Graph Partial Pooling

IF 7.5 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

IEEE Transactions on Big Data Pub Date : 2024-03-20 DOI:10.1109/TBDATA.2024.3403380

Qi Zhang;Yanfeng Sun;Shaofan Wang;Junbin Gao;Yongli Hu;Baocai Yin

{"title":"Training Large-Scale Graph Neural Networks via Graph Partial Pooling","authors":"Qi Zhang;Yanfeng Sun;Shaofan Wang;Junbin Gao;Yongli Hu;Baocai Yin","doi":"10.1109/TBDATA.2024.3403380","DOIUrl":null,"url":null,"abstract":"Graph Neural Networks (GNNs) are powerful tools for graph representation learning, but they face challenges when applied to large-scale graphs due to substantial computational costs and memory requirements. To address scalability limitations, various methods have been proposed, including sampling-based and decoupling-based methods. However, these methods have their limitations: sampling-based methods inevitably discard some link information during the sampling process, while decoupling-based methods require alterations to the model's structure, reducing their adaptability to various GNNs. This paper proposes a novel graph pooling method, Graph Partial Pooling (GPPool), for scaling GNNs to large-scale graphs. GPPool is a versatile and straightforward technique that enhances training efficiency while simultaneously reducing memory requirements. GPPool constructs small-scale pooled graphs by pooling partial nodes into supernodes. Each pooled graph consists of supernodes and unpooled nodes, preserving valuable local and global information. Training GNNs on these graphs reduces memory demands and enhances their performance. Additionally, this paper provides a theoretical analysis of training GNNs using GPPool-constructed graphs from a graph diffusion perspective. It shows that a GNN can be transformed from a large-scale graph into pooled graphs with minimal approximation error. A series of experiments on datasets of varying scales demonstrates the effectiveness of GPPool.","PeriodicalId":13106,"journal":{"name":"IEEE Transactions on Big Data","volume":"11 1","pages":"221-233"},"PeriodicalIF":7.5000,"publicationDate":"2024-03-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Big Data","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10535224/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

Abstract

Graph Neural Networks (GNNs) are powerful tools for graph representation learning, but they face challenges when applied to large-scale graphs due to substantial computational costs and memory requirements. To address scalability limitations, various methods have been proposed, including sampling-based and decoupling-based methods. However, these methods have their limitations: sampling-based methods inevitably discard some link information during the sampling process, while decoupling-based methods require alterations to the model's structure, reducing their adaptability to various GNNs. This paper proposes a novel graph pooling method, Graph Partial Pooling (GPPool), for scaling GNNs to large-scale graphs. GPPool is a versatile and straightforward technique that enhances training efficiency while simultaneously reducing memory requirements. GPPool constructs small-scale pooled graphs by pooling partial nodes into supernodes. Each pooled graph consists of supernodes and unpooled nodes, preserving valuable local and global information. Training GNNs on these graphs reduces memory demands and enhances their performance. Additionally, this paper provides a theoretical analysis of training GNNs using GPPool-constructed graphs from a graph diffusion perspective. It shows that a GNN can be transformed from a large-scale graph into pooled graphs with minimal approximation error. A series of experiments on datasets of varying scales demonstrates the effectiveness of GPPool.

查看原文本刊更多论文

通过图部分池化训练大规模图神经网络

图神经网络（gnn）是图表示学习的强大工具，但由于大量的计算成本和内存需求，它们在应用于大规模图时面临挑战。为了解决可伸缩性的限制，已经提出了各种方法，包括基于采样和基于解耦的方法。然而，这些方法都有其局限性：基于采样的方法在采样过程中不可避免地丢弃了一些链路信息，而基于解耦的方法需要改变模型的结构，降低了其对各种gnn的适应性。本文提出了一种新的图池化方法——图部分池化（GPPool），用于将gnn扩展到大规模图。GPPool是一种通用且简单的技术，可以提高训练效率，同时降低内存需求。GPPool通过将部分节点池化为超级节点来构建小规模池化图。每个池化图由超级节点和非池化节点组成，保留有价值的局部和全局信息。在这些图上训练gnn可以减少内存需求并提高其性能。此外，本文还从图扩散的角度对gppool构造图训练gnn进行了理论分析。结果表明，GNN可以以最小的近似误差将大规模图转换为池图。在不同规模的数据集上进行的一系列实验证明了GPPool的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Big Data Multiple-

CiteScore

11.80

自引率

2.80%

发文量

114

期刊介绍： The IEEE Transactions on Big Data publishes peer-reviewed articles focusing on big data. These articles present innovative research ideas and application results across disciplines, including novel theories, algorithms, and applications. Research areas cover a wide range, such as big data analytics, visualization, curation, management, semantics, infrastructure, standards, performance analysis, intelligence extraction, scientific discovery, security, privacy, and legal issues specific to big data. The journal also prioritizes applications of big data in fields generating massive datasets.