Topology reorganized graph contrastive learning with mitigating semantic drift

IF 7.5 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Pattern Recognition Pub Date : 2024-11-10 DOI:10.1016/j.patcog.2024.111160

Jiaqiang Zhang, Songcan Chen

{"title":"Topology reorganized graph contrastive learning with mitigating semantic drift","authors":"Jiaqiang Zhang, Songcan Chen","doi":"10.1016/j.patcog.2024.111160","DOIUrl":null,"url":null,"abstract":"<div><div>Graph contrastive learning (GCL) is an effective paradigm for node representation learning in graphs. The key components hidden behind GCL are data augmentation and positive–negative pair selection. Typical data augmentations in GCL, such as uniform deletion of edges, are generally blind and resort to local perturbation, which is prone to producing under-diversity views. Additionally, there is a risk of making the augmented data traverse to other classes. Moreover, most methods always treat all other samples as negatives. Such a negative pairing naturally results in sampling bias and likewise may make the learned representation suffer from semantic drift. Therefore, to increase the diversity of the contrastive view, we propose two simple and effective global topological augmentations to compensate current GCL. One is to mine the semantic correlation between nodes in the feature space. The other is to utilize the algebraic properties of the adjacency matrix to characterize the topology by eigen-decomposition. With the help of both, we can retain important edges to build a better view. To reduce the risk of semantic drift, a prototype-based negative pair selection is further designed which can filter false negative samples. Extensive experiments on various tasks demonstrate the advantages of the model compared to the state-of-the-art methods.</div></div>","PeriodicalId":49713,"journal":{"name":"Pattern Recognition","volume":"159 ","pages":"Article 111160"},"PeriodicalIF":7.5000,"publicationDate":"2024-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Pattern Recognition","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0031320324009117","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Graph contrastive learning (GCL) is an effective paradigm for node representation learning in graphs. The key components hidden behind GCL are data augmentation and positive–negative pair selection. Typical data augmentations in GCL, such as uniform deletion of edges, are generally blind and resort to local perturbation, which is prone to producing under-diversity views. Additionally, there is a risk of making the augmented data traverse to other classes. Moreover, most methods always treat all other samples as negatives. Such a negative pairing naturally results in sampling bias and likewise may make the learned representation suffer from semantic drift. Therefore, to increase the diversity of the contrastive view, we propose two simple and effective global topological augmentations to compensate current GCL. One is to mine the semantic correlation between nodes in the feature space. The other is to utilize the algebraic properties of the adjacency matrix to characterize the topology by eigen-decomposition. With the help of both, we can retain important edges to build a better view. To reduce the risk of semantic drift, a prototype-based negative pair selection is further designed which can filter false negative samples. Extensive experiments on various tasks demonstrate the advantages of the model compared to the state-of-the-art methods.

查看原文本刊更多论文

拓扑重组图对比学习，缓解语义漂移

图形对比学习（GCL）是一种有效的图形节点表示学习范式。隐藏在 GCL 背后的关键要素是数据增强和正负对选择。GCL 中的典型数据增强方法（如均匀删除边）一般都是盲目的，而且采用局部扰动，容易产生多样性不足的视图。此外，还存在使增强数据遍历其他类别的风险。此外，大多数方法总是将所有其他样本视为负样本。这样的否定配对自然会导致采样偏差，同样也可能使学习到的表征出现语义漂移。因此，为了增加对比视图的多样性，我们提出了两种简单有效的全局拓扑增强方法来弥补当前的 GCL。一种是挖掘特征空间中节点之间的语义相关性。另一种是利用邻接矩阵的代数特性，通过特征分解来描述拓扑结构。在这两种方法的帮助下，我们可以保留重要的边，从而建立更好的视图。为了降低语义漂移的风险，我们进一步设计了一种基于原型的负对选择，它可以过滤假负样本。在各种任务上的广泛实验证明了该模型与最先进方法相比的优势。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Pattern Recognition 工程技术-工程：电子与电气

CiteScore

14.40

自引率

16.20%

发文量

683

审稿时长

5.6 months

期刊介绍： The field of Pattern Recognition is both mature and rapidly evolving, playing a crucial role in various related fields such as computer vision, image processing, text analysis, and neural networks. It closely intersects with machine learning and is being applied in emerging areas like biometrics, bioinformatics, multimedia data analysis, and data science. The journal Pattern Recognition, established half a century ago during the early days of computer science, has since grown significantly in scope and influence.