Label Deconvolution for Node Representation Learning on Large-Scale Attributed Graphs Against Learning Bias

Zhihao Shi;Jie Wang;Fanghua Lu;Hanzhu Chen;Defu Lian;Zheng Wang;Jieping Ye;Feng Wu
{"title":"Label Deconvolution for Node Representation Learning on Large-Scale Attributed Graphs Against Learning Bias","authors":"Zhihao Shi;Jie Wang;Fanghua Lu;Hanzhu Chen;Defu Lian;Zheng Wang;Jieping Ye;Feng Wu","doi":"10.1109/TPAMI.2024.3459408","DOIUrl":null,"url":null,"abstract":"Node representation learning on attributed graphs—whose nodes are associated with rich attributes (e.g., texts and protein sequences)—plays a crucial role in many important downstream tasks. To encode the attributes and graph structures simultaneously, recent studies integrate pre-trained models with graph neural networks (GNNs), where pre-trained models serve as node encoders (NEs) to encode the attributes. As jointly training large NEs and GNNs on large-scale graphs suffers from severe scalability issues, many methods propose to train NEs and GNNs separately. Consequently, they do not take feature convolutions in GNNs into consideration in the training phase of NEs, leading to a significant learning bias relative to the joint training. To address this challenge, we propose an efficient label regularization technique, namely \n<bold>L</b>\nabel \n<bold>D</b>\neconvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. The inverse mapping leads to an objective function that is equivalent to that by the joint training, while it can effectively incorporate GNNs in the training phase of NEs against the learning bias. More importantly, we show that LD converges to the optimal objective function values by the joint training under mild assumptions. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph Benchmark datasets.","PeriodicalId":94034,"journal":{"name":"IEEE transactions on pattern analysis and machine intelligence","volume":"46 12","pages":"11273-11286"},"PeriodicalIF":0.0000,"publicationDate":"2024-09-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE transactions on pattern analysis and machine intelligence","FirstCategoryId":"1085","ListUrlMain":"https://ieeexplore.ieee.org/document/10678812/","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Node representation learning on attributed graphs—whose nodes are associated with rich attributes (e.g., texts and protein sequences)—plays a crucial role in many important downstream tasks. To encode the attributes and graph structures simultaneously, recent studies integrate pre-trained models with graph neural networks (GNNs), where pre-trained models serve as node encoders (NEs) to encode the attributes. As jointly training large NEs and GNNs on large-scale graphs suffers from severe scalability issues, many methods propose to train NEs and GNNs separately. Consequently, they do not take feature convolutions in GNNs into consideration in the training phase of NEs, leading to a significant learning bias relative to the joint training. To address this challenge, we propose an efficient label regularization technique, namely L abel D econvolution (LD), to alleviate the learning bias by a novel and highly scalable approximation to the inverse mapping of GNNs. The inverse mapping leads to an objective function that is equivalent to that by the joint training, while it can effectively incorporate GNNs in the training phase of NEs against the learning bias. More importantly, we show that LD converges to the optimal objective function values by the joint training under mild assumptions. Experiments demonstrate LD significantly outperforms state-of-the-art methods on Open Graph Benchmark datasets.
针对学习偏差的大规模归属图节点表征学习的标签解卷积
归属图的节点表示学习--其节点与丰富的属性(如文本和蛋白质序列)相关--在许多重要的下游任务中发挥着至关重要的作用。为了同时对属性和图结构进行编码,最近的研究将预训练模型与图神经网络(GNN)结合起来,其中预训练模型作为节点编码器(NE)对属性进行编码。由于在大规模图上联合训练大型节点编码器和图神经网络存在严重的可扩展性问题,因此许多方法建议分别训练节点编码器和图神经网络。因此,在 NEs 的训练阶段,它们没有考虑到 GNN 中的特征卷积,从而导致相对于联合训练的显著学习偏差。为了应对这一挑战,我们提出了一种高效的标签正则化技术,即标签解卷积(LD),通过对 GNNs 的反映射进行新颖且高度可扩展的近似来减轻学习偏差。反映射导致的目标函数等同于联合训练的目标函数,同时它能有效地将 GNNs 纳入 NEs 的训练阶段,消除学习偏差。更重要的是,我们证明了在温和的假设条件下,LD 可以通过联合训练收敛到最佳目标函数值。实验证明,在开放图基准数据集上,LD 的性能明显优于最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信