使用锚点采样策略和去重策略对归属图进行节点聚类

IF 5.3 3区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Emerging Topics in Computational Intelligence Pub Date : 2024-03-12 DOI:10.1109/TETCI.2024.3369849

Qian Tang;Yiji Zhao;Hao Wu;Lei Zhang

{"title":"使用锚点采样策略和去重策略对归属图进行节点聚类","authors":"Qian Tang;Yiji Zhao;Hao Wu;Lei Zhang","doi":"10.1109/TETCI.2024.3369849","DOIUrl":null,"url":null,"abstract":"Contrastive representation learning has been widely employed in attributed graph clustering and has demonstrated significant success. However, these methods have two problems: 1)According to an assumption that clusters are formed around a minority of central anchor nodes, the contrastive relationships between these anchors are not explored in previous works. 2)They fail to deal with biased sample pairs, which may degrade the representation quality and cause poor clustering performance. To solve the problems, we propose a framework termed GE-S-D for both node representation learning and clustering, which consists of an anchor sampling strategy, a low-pass graph encoder, and a debiasing strategy. Specifically, to reveal the contrastive relationships between anchors, we design a sampling strategy to select a small number of anchors and then construct a training set of positive and negative sample pairs for contrastive learning. Then, we introduce a low-pass graph encoder to propagate contrastive messages to all nodes and learn cluster-friendly node representations. Furthermore, to alleviate the interference of biased sample pairs, we design a debiasing strategy using K-Means on the node representations to obtain the clustering information and remove the false positive and false negative sample pairs in the training set for improving contrastive learning. The clustering performance is verified on five benchmark datasets, and our method is superior to many state-of-the-art methods according to quantitive and qualitative analysis.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 4","pages":"3017-3028"},"PeriodicalIF":5.3000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Node Clustering on Attributed Graph Using Anchor Sampling Strategy and Debiasing Strategy\",\"authors\":\"Qian Tang;Yiji Zhao;Hao Wu;Lei Zhang\",\"doi\":\"10.1109/TETCI.2024.3369849\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Contrastive representation learning has been widely employed in attributed graph clustering and has demonstrated significant success. However, these methods have two problems: 1)According to an assumption that clusters are formed around a minority of central anchor nodes, the contrastive relationships between these anchors are not explored in previous works. 2)They fail to deal with biased sample pairs, which may degrade the representation quality and cause poor clustering performance. To solve the problems, we propose a framework termed GE-S-D for both node representation learning and clustering, which consists of an anchor sampling strategy, a low-pass graph encoder, and a debiasing strategy. Specifically, to reveal the contrastive relationships between anchors, we design a sampling strategy to select a small number of anchors and then construct a training set of positive and negative sample pairs for contrastive learning. Then, we introduce a low-pass graph encoder to propagate contrastive messages to all nodes and learn cluster-friendly node representations. Furthermore, to alleviate the interference of biased sample pairs, we design a debiasing strategy using K-Means on the node representations to obtain the clustering information and remove the false positive and false negative sample pairs in the training set for improving contrastive learning. The clustering performance is verified on five benchmark datasets, and our method is superior to many state-of-the-art methods according to quantitive and qualitative analysis.\",\"PeriodicalId\":13135,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"volume\":\"8 4\",\"pages\":\"3017-3028\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2024-03-12\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computational Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10463188/\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10463188/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

对比表示学习已被广泛应用于属性图聚类，并取得了显著成效。然而，这些方法存在两个问题：1）根据围绕少数中心锚节点形成聚类的假设，这些锚节点之间的对比关系在以前的工作中没有被探索。2）这些方法无法处理有偏差的样本对，这可能会降低表示质量，导致聚类效果不佳。为了解决这些问题，我们提出了一个用于节点表示学习和聚类的框架，称为 GE-S-D，它由锚取样策略、低通图编码器和去除法策略组成。具体来说，为了揭示锚点之间的对比关系，我们设计了一种抽样策略来选择少量锚点，然后构建一个正负样本对训练集，用于对比学习。然后，我们引入低通图编码器，将对比信息传播到所有节点，并学习集群友好的节点表征。此外，为了减轻有偏差的样本对的干扰，我们设计了一种除杂策略，使用 K-Means 算法对节点表征进行除杂，以获取聚类信息，并去除训练集中的假阳性和假阴性样本对，从而提高对比学习效果。聚类性能在五个基准数据集上得到了验证，根据定量和定性分析，我们的方法优于许多最先进的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Node Clustering on Attributed Graph Using Anchor Sampling Strategy and Debiasing Strategy

Contrastive representation learning has been widely employed in attributed graph clustering and has demonstrated significant success. However, these methods have two problems: 1)According to an assumption that clusters are formed around a minority of central anchor nodes, the contrastive relationships between these anchors are not explored in previous works. 2)They fail to deal with biased sample pairs, which may degrade the representation quality and cause poor clustering performance. To solve the problems, we propose a framework termed GE-S-D for both node representation learning and clustering, which consists of an anchor sampling strategy, a low-pass graph encoder, and a debiasing strategy. Specifically, to reveal the contrastive relationships between anchors, we design a sampling strategy to select a small number of anchors and then construct a training set of positive and negative sample pairs for contrastive learning. Then, we introduce a low-pass graph encoder to propagate contrastive messages to all nodes and learn cluster-friendly node representations. Furthermore, to alleviate the interference of biased sample pairs, we design a debiasing strategy using K-Means on the node representations to obtain the clustering information and remove the false positive and false negative sample pairs in the training set for improving contrastive learning. The clustering performance is verified on five benchmark datasets, and our method is superior to many state-of-the-art methods according to quantitive and qualitative analysis.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

IEEE Transactions on Emerging Topics in Computational Intelligence Mathematics-Control and Optimization

CiteScore

10.30

自引率

7.50%

发文量

147

期刊介绍： The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.