Node Clustering on Attributed Graph Using Anchor Sampling Strategy and Debiasing Strategy

IF 5.3 3区 计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE
Qian Tang;Yiji Zhao;Hao Wu;Lei Zhang
{"title":"Node Clustering on Attributed Graph Using Anchor Sampling Strategy and Debiasing Strategy","authors":"Qian Tang;Yiji Zhao;Hao Wu;Lei Zhang","doi":"10.1109/TETCI.2024.3369849","DOIUrl":null,"url":null,"abstract":"Contrastive representation learning has been widely employed in attributed graph clustering and has demonstrated significant success. However, these methods have two problems: 1)According to an assumption that clusters are formed around a minority of central anchor nodes, the contrastive relationships between these anchors are not explored in previous works. 2)They fail to deal with biased sample pairs, which may degrade the representation quality and cause poor clustering performance. To solve the problems, we propose a framework termed GE-S-D for both node representation learning and clustering, which consists of an anchor sampling strategy, a low-pass graph encoder, and a debiasing strategy. Specifically, to reveal the contrastive relationships between anchors, we design a sampling strategy to select a small number of anchors and then construct a training set of positive and negative sample pairs for contrastive learning. Then, we introduce a low-pass graph encoder to propagate contrastive messages to all nodes and learn cluster-friendly node representations. Furthermore, to alleviate the interference of biased sample pairs, we design a debiasing strategy using K-Means on the node representations to obtain the clustering information and remove the false positive and false negative sample pairs in the training set for improving contrastive learning. The clustering performance is verified on five benchmark datasets, and our method is superior to many state-of-the-art methods according to quantitive and qualitative analysis.","PeriodicalId":13135,"journal":{"name":"IEEE Transactions on Emerging Topics in Computational Intelligence","volume":"8 4","pages":"3017-3028"},"PeriodicalIF":5.3000,"publicationDate":"2024-03-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computational Intelligence","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10463188/","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0

Abstract

Contrastive representation learning has been widely employed in attributed graph clustering and has demonstrated significant success. However, these methods have two problems: 1)According to an assumption that clusters are formed around a minority of central anchor nodes, the contrastive relationships between these anchors are not explored in previous works. 2)They fail to deal with biased sample pairs, which may degrade the representation quality and cause poor clustering performance. To solve the problems, we propose a framework termed GE-S-D for both node representation learning and clustering, which consists of an anchor sampling strategy, a low-pass graph encoder, and a debiasing strategy. Specifically, to reveal the contrastive relationships between anchors, we design a sampling strategy to select a small number of anchors and then construct a training set of positive and negative sample pairs for contrastive learning. Then, we introduce a low-pass graph encoder to propagate contrastive messages to all nodes and learn cluster-friendly node representations. Furthermore, to alleviate the interference of biased sample pairs, we design a debiasing strategy using K-Means on the node representations to obtain the clustering information and remove the false positive and false negative sample pairs in the training set for improving contrastive learning. The clustering performance is verified on five benchmark datasets, and our method is superior to many state-of-the-art methods according to quantitive and qualitative analysis.
使用锚点采样策略和去重策略对归属图进行节点聚类
对比表示学习已被广泛应用于属性图聚类,并取得了显著成效。然而,这些方法存在两个问题:1)根据围绕少数中心锚节点形成聚类的假设,这些锚节点之间的对比关系在以前的工作中没有被探索。2)这些方法无法处理有偏差的样本对,这可能会降低表示质量,导致聚类效果不佳。为了解决这些问题,我们提出了一个用于节点表示学习和聚类的框架,称为 GE-S-D,它由锚取样策略、低通图编码器和去除法策略组成。具体来说,为了揭示锚点之间的对比关系,我们设计了一种抽样策略来选择少量锚点,然后构建一个正负样本对训练集,用于对比学习。然后,我们引入低通图编码器,将对比信息传播到所有节点,并学习集群友好的节点表征。此外,为了减轻有偏差的样本对的干扰,我们设计了一种除杂策略,使用 K-Means 算法对节点表征进行除杂,以获取聚类信息,并去除训练集中的假阳性和假阴性样本对,从而提高对比学习效果。聚类性能在五个基准数据集上得到了验证,根据定量和定性分析,我们的方法优于许多最先进的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
CiteScore
10.30
自引率
7.50%
发文量
147
期刊介绍: The IEEE Transactions on Emerging Topics in Computational Intelligence (TETCI) publishes original articles on emerging aspects of computational intelligence, including theory, applications, and surveys. TETCI is an electronics only publication. TETCI publishes six issues per year. Authors are encouraged to submit manuscripts in any emerging topic in computational intelligence, especially nature-inspired computing topics not covered by other IEEE Computational Intelligence Society journals. A few such illustrative examples are glial cell networks, computational neuroscience, Brain Computer Interface, ambient intelligence, non-fuzzy computing with words, artificial life, cultural learning, artificial endocrine networks, social reasoning, artificial hormone networks, computational intelligence for the IoT and Smart-X technologies.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信