GRAIL：平衡负抽样的图对比学习

IF 7.4 1区管理学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Information Processing & Management Pub Date : 2025-05-27 DOI:10.1016/j.ipm.2025.104211

Chengcheng Xu , Tianfeng Wang , Man Chen , Jun Chen , Wei Li , Zhisong Pan

{"title":"GRAIL：平衡负抽样的图对比学习","authors":"Chengcheng Xu , Tianfeng Wang , Man Chen , Jun Chen , Wei Li , Zhisong Pan","doi":"10.1016/j.ipm.2025.104211","DOIUrl":null,"url":null,"abstract":"<div><div>Currently, some graph contrastive learning methods mitigate the class imbalance by balancing the number of anchors, overlooking the crucial role of negative samples in forming a regular simplex. Moreover, existing strategies select a limited number of positive samples with poor quality, causing the model to erroneously push away nodes with similar semantics. To address these issues, we propose a <strong>g</strong>raph cont<strong>r</strong>astive learning method with b<strong>a</strong>lanced negat<strong>i</strong>ve samp<strong>l</strong>ing, named GRAIL. Specifically, GRAIL introduces a multi-head similarity metric that leverages mixed probability distributions related to dimensional elements to adaptively select an equal number of hard negative samples within each non-anchor cluster. As a result, GRAIL not only promotes the formation of a regular simplex by balancing the gradient contributions of different negative classes but also selects the most informative hard negative samples to improve the distinguishing ability of minority classes while minimizing the impact on majority classes. Furthermore, GRAIL selects multiple positive samples with a high correct ratio using structural similarity and feature similarity, thereby enabling the model to learn trustworthy node representations. Since traditional contrastive loss focuses on the majority class while neglecting the minority class, a balanced contrastive loss is introduced to optimize node representations. Experiments on node classification, node clustering, and link prediction tasks across six imbalanced graph datasets demonstrate that GRAIL outperforms existing state-of-the-art methods. The source code is available at <span><span>https://github.com/xushucheng-coder/GRAIL/tree/master</span><svg><path></path></svg></span>.</div></div>","PeriodicalId":50365,"journal":{"name":"Information Processing & Management","volume":"62 5","pages":"Article 104211"},"PeriodicalIF":7.4000,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GRAIL: Graph contrastive learning with balanced negative sampling\",\"authors\":\"Chengcheng Xu , Tianfeng Wang , Man Chen , Jun Chen , Wei Li , Zhisong Pan\",\"doi\":\"10.1016/j.ipm.2025.104211\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Currently, some graph contrastive learning methods mitigate the class imbalance by balancing the number of anchors, overlooking the crucial role of negative samples in forming a regular simplex. Moreover, existing strategies select a limited number of positive samples with poor quality, causing the model to erroneously push away nodes with similar semantics. To address these issues, we propose a <strong>g</strong>raph cont<strong>r</strong>astive learning method with b<strong>a</strong>lanced negat<strong>i</strong>ve samp<strong>l</strong>ing, named GRAIL. Specifically, GRAIL introduces a multi-head similarity metric that leverages mixed probability distributions related to dimensional elements to adaptively select an equal number of hard negative samples within each non-anchor cluster. As a result, GRAIL not only promotes the formation of a regular simplex by balancing the gradient contributions of different negative classes but also selects the most informative hard negative samples to improve the distinguishing ability of minority classes while minimizing the impact on majority classes. Furthermore, GRAIL selects multiple positive samples with a high correct ratio using structural similarity and feature similarity, thereby enabling the model to learn trustworthy node representations. Since traditional contrastive loss focuses on the majority class while neglecting the minority class, a balanced contrastive loss is introduced to optimize node representations. Experiments on node classification, node clustering, and link prediction tasks across six imbalanced graph datasets demonstrate that GRAIL outperforms existing state-of-the-art methods. The source code is available at <span><span>https://github.com/xushucheng-coder/GRAIL/tree/master</span><svg><path></path></svg></span>.</div></div>\",\"PeriodicalId\":50365,\"journal\":{\"name\":\"Information Processing & Management\",\"volume\":\"62 5\",\"pages\":\"Article 104211\"},\"PeriodicalIF\":7.4000,\"publicationDate\":\"2025-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Information Processing & Management\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0306457325001529\",\"RegionNum\":1,\"RegionCategory\":\"管理学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Processing & Management","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0306457325001529","RegionNum":1,"RegionCategory":"管理学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

目前，一些图对比学习方法通过平衡锚点的数量来缓解类不平衡，忽略了负样本在形成规则单纯形中的关键作用。此外，现有策略选择的阳性样本数量有限，质量较差，导致模型错误地推离语义相似的节点。为了解决这些问题，我们提出了一种平衡负抽样的图对比学习方法，称为GRAIL。具体来说，GRAIL引入了一个多头相似性度量，该度量利用与维度元素相关的混合概率分布，自适应地在每个非锚点聚类中选择相同数量的硬负样本。因此，GRAIL不仅通过平衡不同负类的梯度贡献来促进规则单纯形的形成，而且还选择信息量最大的硬负样本来提高少数类的区分能力，同时最大限度地减少对多数类的影响。此外，GRAIL利用结构相似度和特征相似度选择正确率高的多个正样本，从而使模型能够学习可信节点表示。由于传统的对比损失主要关注多数类而忽略了少数类，因此引入平衡对比损失来优化节点表示。在六个不平衡图数据集上对节点分类、节点聚类和链接预测任务进行的实验表明，GRAIL优于现有的最先进的方法。源代码可从https://github.com/xushucheng-coder/GRAIL/tree/master获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

GRAIL: Graph contrastive learning with balanced negative sampling

Currently, some graph contrastive learning methods mitigate the class imbalance by balancing the number of anchors, overlooking the crucial role of negative samples in forming a regular simplex. Moreover, existing strategies select a limited number of positive samples with poor quality, causing the model to erroneously push away nodes with similar semantics. To address these issues, we propose a graph contrastive learning method with balanced negative sampling, named GRAIL. Specifically, GRAIL introduces a multi-head similarity metric that leverages mixed probability distributions related to dimensional elements to adaptively select an equal number of hard negative samples within each non-anchor cluster. As a result, GRAIL not only promotes the formation of a regular simplex by balancing the gradient contributions of different negative classes but also selects the most informative hard negative samples to improve the distinguishing ability of minority classes while minimizing the impact on majority classes. Furthermore, GRAIL selects multiple positive samples with a high correct ratio using structural similarity and feature similarity, thereby enabling the model to learn trustworthy node representations. Since traditional contrastive loss focuses on the majority class while neglecting the minority class, a balanced contrastive loss is introduced to optimize node representations. Experiments on node classification, node clustering, and link prediction tasks across six imbalanced graph datasets demonstrate that GRAIL outperforms existing state-of-the-art methods. The source code is available at https://github.com/xushucheng-coder/GRAIL/tree/master.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Information Processing & Management 工程技术-计算机：信息系统

CiteScore

17.00

自引率

11.60%

发文量

276

审稿时长

39 days

期刊介绍： Information Processing and Management is dedicated to publishing cutting-edge original research at the convergence of computing and information science. Our scope encompasses theory, methods, and applications across various domains, including advertising, business, health, information science, information technology marketing, and social computing. We aim to cater to the interests of both primary researchers and practitioners by offering an effective platform for the timely dissemination of advanced and topical issues in this interdisciplinary field. The journal places particular emphasis on original research articles, research survey articles, research method articles, and articles addressing critical applications of research. Join us in advancing knowledge and innovation at the intersection of computing and information science.