基于生成的多视图对比，实现自我监督图表示学习

IF 4.8 3区计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

ACM Transactions on Knowledge Discovery from Data Pub Date : 2024-02-09 DOI:10.1145/3645095

Yuehui Han

{"title":"基于生成的多视图对比，实现自我监督图表示学习","authors":"Yuehui Han","doi":"10.1145/3645095","DOIUrl":null,"url":null,"abstract":"<p>Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.</p>","PeriodicalId":49249,"journal":{"name":"ACM Transactions on Knowledge Discovery from Data","volume":"107 1","pages":""},"PeriodicalIF":4.8000,"publicationDate":"2024-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Generation based Multi-view Contrast for Self-Supervised Graph Representation Learning\",\"authors\":\"Yuehui Han\",\"doi\":\"10.1145/3645095\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p>Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.</p>\",\"PeriodicalId\":49249,\"journal\":{\"name\":\"ACM Transactions on Knowledge Discovery from Data\",\"volume\":\"107 1\",\"pages\":\"\"},\"PeriodicalIF\":4.8000,\"publicationDate\":\"2024-02-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACM Transactions on Knowledge Discovery from Data\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1145/3645095\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACM Transactions on Knowledge Discovery from Data","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1145/3645095","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}

引用次数: 0

摘要

图对比学习在图结构数据的自监督表示学习方面取得了显著成就。通过使用扰动函数（即对图的节点或边进行扰动），大多数图对比学习方法都能在原始图上构建对比样本。然而，基于扰动的数据增强方法会随机改变图的固有信息（如属性或结构）。因此，在扰动图上进行节点嵌入后，我们无法保证对比样本的有效性以及图对比学习的学习性能。为此，我们在本文中为自监督图表示学习提出了一种新颖的基于生成器的多视图对比学习框架（GMVC），它基于我们的生成器而不是扰动函数生成对比样本。具体来说，在原始图上嵌入节点后，我们首先在邻域中采用随机行走的方法，为每个锚节点开发多个相关节点序列。然后，我们根据采样节点序列的特征和结构，利用变换器生成锚节点的相关对比样本表示。最后，通过最大限度地提高锚点视图与生成视图之间的一致性，我们迫使模型有效地将图信息编码到节点嵌入中。我们在八个基准数据集上进行了节点分类和链接预测任务的大量实验，验证了我们基于生成的多视图图对比学习方法的有效性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Generation based Multi-view Contrast for Self-Supervised Graph Representation Learning

Graph contrastive learning has made remarkable achievements in the self-supervised representation learning of graph-structured data. By employing perturbation function (i.e., perturbation on the nodes or edges of graph), most graph contrastive learning methods construct contrastive samples on the original graph. However, the perturbation based data augmentation methods randomly change the inherent information (e.g., attributes or structures) of the graph. Therefore, after nodes embedding on the perturbed graph, we cannot guarantee the validity of the contrastive samples as well as the learned performance of graph contrastive learning. To this end, in this paper, we propose a novel generation based multi-view contrastive learning framework (GMVC) for self-supervised graph representation learning, which generates the contrastive samples based on our generator rather than perturbation function. Specifically, after nodes embedding on original graph we first employ random walk in the neighborhood to develop multiple relevant node sequences for each anchor node. We then utilize the transformer to generate the representations of relevant contrastive samples of anchor node based on the features and structures of the sampled node sequences. Finally, by maximizing the consistency between the anchor view and the generated views, we force the model to effectively encode graph information into nodes embeddings. We perform extensive experiments of node classification and link prediction tasks on eight benchmark datasets, which verify the effectiveness of our generation based multi-view graph contrastive learning method.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACM Transactions on Knowledge Discovery from Data COMPUTER SCIENCE, INFORMATION SYSTEMS-COMPUTER SCIENCE, SOFTWARE ENGINEERING

CiteScore

6.70

自引率

5.60%

发文量

172

审稿时长

3 months

期刊介绍： TKDD welcomes papers on a full range of research in the knowledge discovery and analysis of diverse forms of data. Such subjects include, but are not limited to: scalable and effective algorithms for data mining and big data analysis, mining brain networks, mining data streams, mining multi-media data, mining high-dimensional data, mining text, Web, and semi-structured data, mining spatial and temporal data, data mining for community generation, social network analysis, and graph structured data, security and privacy issues in data mining, visual, interactive and online data mining, pre-processing and post-processing for data mining, robust and scalable statistical methods, data mining languages, foundations of data mining, KDD framework and process, and novel applications and infrastructures exploiting data mining technology including massively parallel processing and cloud computing platforms. TKDD encourages papers that explore the above subjects in the context of large distributed networks of computers, parallel or multiprocessing computers, or new data devices. TKDD also encourages papers that describe emerging data mining applications that cannot be satisfied by the current data mining technology.