SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning

Proc. VLDB Endow. Pub Date : 2023-03-06 DOI:10.48550/arXiv.2303.03379

Haoteng Yin, Muhan Zhang, Jianguo Wang, Pan Li

{"title":"SUREL+: Moving from Walks to Sets for Scalable Subgraph-based Graph Representation Learning","authors":"Haoteng Yin, Muhan Zhang, Jianguo Wang, Pan Li","doi":"10.48550/arXiv.2303.03379","DOIUrl":null,"url":null,"abstract":"Subgraph-based graph representation learning (SGRL) has recently emerged as a powerful tool in many prediction tasks on graphs due to its advantages in model expressiveness and generalization ability. Most previous SGRL models face computational issues related to the high cost of extracting subgraphs for each training or testing query. Recently, SUREL was proposed to accelerate SGRL, which samples random walks offline and joins these walks online as a proxy of subgraphs for prediction. Thanks to the reusability of sampled walks across different queries, SUREL achieves state-of-the-art performance in terms of scalability and prediction accuracy. However, SUREL still suffers from high computational overhead caused by node redundancy in sampled walks. In this work, we propose a novel framework SUREL+ that upgrades SUREL by using node sets instead of walks to represent subgraphs. By definition, such set-based representations avoid repeated nodes, but node sets can be irregular in size. To solve this issue, we design a dedicated sparse data structure to efficiently store and access node sets, and provide a specialized operator to join them in parallel batches. SUREL+ is modularized to support multiple types of set samplers, structural features, and neural encoders to complement the loss of structural information after the reduction from walks to sets. Extensive experiments have been performed to verify the effectiveness of SUREL+ in the prediction tasks of links, relation types, and higher-order patterns. SUREL+ achieves 3--11× speedups of SUREL while maintaining comparable or even better prediction performance; compared to other SGRL baselines, SUREL+ achieves ~20× speedups and significantly improves the prediction accuracy.","PeriodicalId":20467,"journal":{"name":"Proc. VLDB Endow.","volume":"3 1","pages":"2939-2948"},"PeriodicalIF":0.0000,"publicationDate":"2023-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proc. VLDB Endow.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.48550/arXiv.2303.03379","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 2

Abstract

Subgraph-based graph representation learning (SGRL) has recently emerged as a powerful tool in many prediction tasks on graphs due to its advantages in model expressiveness and generalization ability. Most previous SGRL models face computational issues related to the high cost of extracting subgraphs for each training or testing query. Recently, SUREL was proposed to accelerate SGRL, which samples random walks offline and joins these walks online as a proxy of subgraphs for prediction. Thanks to the reusability of sampled walks across different queries, SUREL achieves state-of-the-art performance in terms of scalability and prediction accuracy. However, SUREL still suffers from high computational overhead caused by node redundancy in sampled walks. In this work, we propose a novel framework SUREL+ that upgrades SUREL by using node sets instead of walks to represent subgraphs. By definition, such set-based representations avoid repeated nodes, but node sets can be irregular in size. To solve this issue, we design a dedicated sparse data structure to efficiently store and access node sets, and provide a specialized operator to join them in parallel batches. SUREL+ is modularized to support multiple types of set samplers, structural features, and neural encoders to complement the loss of structural information after the reduction from walks to sets. Extensive experiments have been performed to verify the effectiveness of SUREL+ in the prediction tasks of links, relation types, and higher-order patterns. SUREL+ achieves 3--11× speedups of SUREL while maintaining comparable or even better prediction performance; compared to other SGRL baselines, SUREL+ achieves ~20× speedups and significantly improves the prediction accuracy.

查看原文本刊更多论文

SUREL+:基于可扩展子图的图表示学习从步行到集合

基于子图的图表示学习(Subgraph-based graph representation learning, SGRL)由于其在模型表达能力和泛化能力方面的优势，近年来在许多图预测任务中成为一种强大的工具。大多数以前的SGRL模型面临着计算问题，这些问题与为每个训练或测试查询提取子图的高成本有关。最近，SUREL被提出用于加速SGRL, SGRL将随机行走离线采样，并将这些行走在线连接作为子图的代理进行预测。由于跨不同查询的采样行走的可重用性，SUREL在可伸缩性和预测准确性方面实现了最先进的性能。然而，SUREL仍然受到采样行走中节点冗余导致的高计算开销的困扰。在这项工作中，我们提出了一个新的框架SUREL+，它通过使用节点集而不是行走来表示子图来升级SUREL。根据定义，这种基于集合的表示避免了重复的节点，但是节点集的大小可能是不规则的。为了解决这个问题，我们设计了一个专用的稀疏数据结构来有效地存储和访问节点集，并提供了一个专门的算子来并行批量地连接节点集。SUREL+是模块化的，支持多种类型的集合采样器、结构特征和神经编码器，以补充从步行到集合减少后结构信息的损失。已经进行了大量的实验来验证SUREL+在链接、关系类型和高阶模式的预测任务中的有效性。SUREL+实现了3- 11倍的速度，同时保持相当甚至更好的预测性能;与其他SGRL基线相比，SUREL+实现了~20倍的加速，显著提高了预测精度。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proc. VLDB Endow.

自引率

0.00%

发文量