SEIGN: A Simple and Efficient Graph Neural Network for Large Dynamic Graphs

2023 IEEE 39th International Conference on Data Engineering (ICDE) Pub Date : 2023-04-01 DOI:10.1109/ICDE55515.2023.00218

Xiao Qin, Nasrullah Sheikh, Chuan Lei, B. Reinwald, Giacomo Domeniconi

{"title":"SEIGN: A Simple and Efficient Graph Neural Network for Large Dynamic Graphs","authors":"Xiao Qin, Nasrullah Sheikh, Chuan Lei, B. Reinwald, Giacomo Domeniconi","doi":"10.1109/ICDE55515.2023.00218","DOIUrl":null,"url":null,"abstract":"Graph neural networks (GNNs) have accomplished great success in learning complex systems of relations arising in broad problem settings ranging from e-commerce, social networks to data management. Training GNNs over large-scale graphs poses challenges for constrained compute resources due to the heavy data dependencies between the nodes. Moreover, modern relational data is constantly evolving, which creates an additional layer of learning challenges with respect to the model scalability and expressivity. This paper introduces a simple and efficient learning algorithm for large discrete-time dynamic graphs (DTDGs) – a widely adopted data model for many applications. We particularly tackle two critical challenges: (1) how the model can be efficiently trained on large-scale DTDGs to exploit hardware accelerators with small memory footprint, and (2) how the model can effectively capture the changing dynamics of the graphs. To the best of our knowledge, existing GNNs fail to address both challenges in their models. Hence, we propose a scalable evolving inception GNN, called SEIGN. Specifically, SEIGN features two connected evolving components that adapt the graph model to the arriving snapshot and capture the changing dynamics of the node embeddings, respectively. To scale up the model training, SEIGN introduces a parameter-free message passing step for DTDGs to substantially remove the data dependencies in training. Furthermore, it significantly reduces the training memory footprint and allows us to construct a succinct graph mini-batch without performing neighborhood sampling. We further optimize the proposed evolving strategies by extracting features from neighbors at varying scales to increase the expressive power of the node representations. Our experimental evaluation, on both public benchmark and real industrial datasets, demonstrates that SEIGN achieves 2%–20% improvement in Area Under Curve (AUC) and Average Precision (AP) on the prediction task over the state-of-the-art baselines. SEIGN also supports efficient graph mini-batch training and gains 2–16 times speedup in epoch computation time over the entire DTDGs.","PeriodicalId":434744,"journal":{"name":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","volume":"2014 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE 39th International Conference on Data Engineering (ICDE)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICDE55515.2023.00218","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

Abstract

Graph neural networks (GNNs) have accomplished great success in learning complex systems of relations arising in broad problem settings ranging from e-commerce, social networks to data management. Training GNNs over large-scale graphs poses challenges for constrained compute resources due to the heavy data dependencies between the nodes. Moreover, modern relational data is constantly evolving, which creates an additional layer of learning challenges with respect to the model scalability and expressivity. This paper introduces a simple and efficient learning algorithm for large discrete-time dynamic graphs (DTDGs) – a widely adopted data model for many applications. We particularly tackle two critical challenges: (1) how the model can be efficiently trained on large-scale DTDGs to exploit hardware accelerators with small memory footprint, and (2) how the model can effectively capture the changing dynamics of the graphs. To the best of our knowledge, existing GNNs fail to address both challenges in their models. Hence, we propose a scalable evolving inception GNN, called SEIGN. Specifically, SEIGN features two connected evolving components that adapt the graph model to the arriving snapshot and capture the changing dynamics of the node embeddings, respectively. To scale up the model training, SEIGN introduces a parameter-free message passing step for DTDGs to substantially remove the data dependencies in training. Furthermore, it significantly reduces the training memory footprint and allows us to construct a succinct graph mini-batch without performing neighborhood sampling. We further optimize the proposed evolving strategies by extracting features from neighbors at varying scales to increase the expressive power of the node representations. Our experimental evaluation, on both public benchmark and real industrial datasets, demonstrates that SEIGN achieves 2%–20% improvement in Area Under Curve (AUC) and Average Precision (AP) on the prediction task over the state-of-the-art baselines. SEIGN also supports efficient graph mini-batch training and gains 2–16 times speedup in epoch computation time over the entire DTDGs.

查看原文本刊更多论文

SEIGN:一种简单高效的大型动态图神经网络

图神经网络(gnn)在学习复杂的关系系统方面取得了巨大的成功，这些关系系统出现在从电子商务、社交网络到数据管理的广泛问题设置中。在大规模图上训练gnn，由于节点之间的数据依赖性很大，对有限的计算资源提出了挑战。此外，现代关系数据不断发展，这在模型可伸缩性和表达性方面带来了额外的学习挑战。本文介绍了一种简单而高效的学习算法，用于学习大型离散时间动态图(dtdg)，这是一种广泛应用的数据模型。我们特别解决了两个关键挑战:(1)如何在大规模dtdg上有效地训练模型，以利用内存占用小的硬件加速器，以及(2)模型如何有效地捕获图的变化动态。据我们所知，现有的gnn未能在其模型中解决这两个挑战。因此，我们提出了一个可扩展的演化初始GNN，称为SEIGN。具体来说，SEIGN具有两个连接的演化组件，它们使图模型适应到达的快照，并分别捕获节点嵌入的变化动态。为了扩大模型训练的规模，SEIGN为dtdg引入了一个无参数的消息传递步骤，从而大大消除了训练中的数据依赖性。此外，它显着减少了训练内存占用，并允许我们在不执行邻域抽样的情况下构建简洁的图mini-batch。我们通过从不同尺度的邻居中提取特征来进一步优化所提出的进化策略，以提高节点表示的表达能力。我们在公共基准和实际工业数据集上的实验评估表明，与最先进的基线相比，SEIGN在预测任务中的曲线下面积(AUC)和平均精度(AP)提高了2%-20%。SEIGN还支持高效的图小批训练，并在整个dtdg的epoch计算时间上获得2-16倍的加速。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

2023 IEEE 39th International Conference on Data Engineering (ICDE)

自引率

0.00%

发文量