基于多尺度权值的两两粗化和对比学习的属性图聚类

IF 5.5 2区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Neurocomputing Pub Date : 2025-06-16 DOI:10.1016/j.neucom.2025.130796

Binxiong Li , Yuefei Wang , Binyu Zhao , Heyang Gao , Benhan Yang , Quanzhou Luo , Xue Li , Xu Xiang , Yujie Liu , Huijie Tang

{"title":"基于多尺度权值的两两粗化和对比学习的属性图聚类","authors":"Binxiong Li , Yuefei Wang , Binyu Zhao , Heyang Gao , Benhan Yang , Quanzhou Luo , Xue Li , Xu Xiang , Yujie Liu , Huijie Tang","doi":"10.1016/j.neucom.2025.130796","DOIUrl":null,"url":null,"abstract":"<div><div>This study introduces the Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning (MPCCL) model, a novel approach for attributed graph clustering that effectively bridges critical gaps in existing methods, including long-range dependency, feature collapse, and information loss. Traditional methods often struggle to capture high-order graph features due to their reliance on low-order attribute information, while contrastive learning techniques face limitations in feature diversity by overemphasizing local neighborhood structures. Similarly, conventional graph coarsening methods, though reducing graph scale, frequently lose fine-grained structural details. MPCCL addresses these challenges through an innovative multi-scale coarsening strategy, which progressively condenses the graph while prioritizing the merging of key edges based on global node similarity to preserve essential structural information. It further introduces a one-to-many contrastive learning paradigm, integrating node embeddings with augmented graph views and cluster centroids to enhance feature diversity, while mitigating feature masking issues caused by the accumulation of high-frequency node weights during multi-scale coarsening. By incorporating a graph reconstruction loss and KL divergence into its self-supervised learning framework, MPCCL ensures cross-scale consistency of node representations. Experimental evaluations reveal that MPCCL achieves a significant improvement in clustering performance, including a remarkable 15.24 % increase in NMI on the ACM dataset and notable robust gains on smaller-scale datasets such as Citeseer, Cora and DBLP. In the large-scale Reuters dataset, it significantly improved by 17.84 %, further validating its advantage in enhancing clustering performance and robustness. These results highlight MPCCL’s potential for application in diverse graph clustering tasks, ranging from social network analysis to bioinformatics and knowledge graph-based data mining. The source code for this study is available at <span><span>https://github.com/YF-W/MPCCL</span><svg><path></path></svg></span></div></div>","PeriodicalId":19268,"journal":{"name":"Neurocomputing","volume":"648 ","pages":"Article 130796"},"PeriodicalIF":5.5000,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Attributed graph clustering with multi-scale weight-based pairwise coarsening and contrastive learning\",\"authors\":\"Binxiong Li , Yuefei Wang , Binyu Zhao , Heyang Gao , Benhan Yang , Quanzhou Luo , Xue Li , Xu Xiang , Yujie Liu , Huijie Tang\",\"doi\":\"10.1016/j.neucom.2025.130796\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>This study introduces the Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning (MPCCL) model, a novel approach for attributed graph clustering that effectively bridges critical gaps in existing methods, including long-range dependency, feature collapse, and information loss. Traditional methods often struggle to capture high-order graph features due to their reliance on low-order attribute information, while contrastive learning techniques face limitations in feature diversity by overemphasizing local neighborhood structures. Similarly, conventional graph coarsening methods, though reducing graph scale, frequently lose fine-grained structural details. MPCCL addresses these challenges through an innovative multi-scale coarsening strategy, which progressively condenses the graph while prioritizing the merging of key edges based on global node similarity to preserve essential structural information. It further introduces a one-to-many contrastive learning paradigm, integrating node embeddings with augmented graph views and cluster centroids to enhance feature diversity, while mitigating feature masking issues caused by the accumulation of high-frequency node weights during multi-scale coarsening. By incorporating a graph reconstruction loss and KL divergence into its self-supervised learning framework, MPCCL ensures cross-scale consistency of node representations. Experimental evaluations reveal that MPCCL achieves a significant improvement in clustering performance, including a remarkable 15.24 % increase in NMI on the ACM dataset and notable robust gains on smaller-scale datasets such as Citeseer, Cora and DBLP. In the large-scale Reuters dataset, it significantly improved by 17.84 %, further validating its advantage in enhancing clustering performance and robustness. These results highlight MPCCL’s potential for application in diverse graph clustering tasks, ranging from social network analysis to bioinformatics and knowledge graph-based data mining. The source code for this study is available at <span><span>https://github.com/YF-W/MPCCL</span><svg><path></path></svg></span></div></div>\",\"PeriodicalId\":19268,\"journal\":{\"name\":\"Neurocomputing\",\"volume\":\"648 \",\"pages\":\"Article 130796\"},\"PeriodicalIF\":5.5000,\"publicationDate\":\"2025-06-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Neurocomputing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S0925231225014687\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Neurocomputing","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0925231225014687","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

摘要

本研究引入了基于多尺度权重的成对粗化和对比学习（MPCCL）模型，这是一种用于属性图聚类的新方法，有效地弥补了现有方法中的关键缺陷，包括远程依赖、特征折叠和信息丢失。传统方法由于依赖于低阶属性信息，往往难以捕获高阶图特征，而对比学习技术由于过分强调局部邻域结构，在特征多样性方面存在局限性。同样，传统的图粗化方法虽然减少了图的尺度，但往往会丢失细粒度的结构细节。MPCCL通过一种创新的多尺度粗化策略来解决这些挑战，该策略逐步压缩图，同时根据全局节点相似性优先合并关键边，以保留基本的结构信息。它进一步引入了一种一对多的对比学习范式，将节点嵌入与增强图视图和聚类质心相结合，以增强特征多样性，同时减轻多尺度粗化过程中高频节点权重积累引起的特征屏蔽问题。MPCCL通过将图重构损失和KL散度纳入其自监督学习框架，确保节点表示的跨尺度一致性。实验评估表明，MPCCL在聚类性能上取得了显著的进步，包括在ACM数据集上的NMI显著提高了15.24 %，在Citeseer、Cora和DBLP等较小规模数据集上的显著提高。在大规模的路透社数据集上，显著提高了17.84 %，进一步验证了其在增强聚类性能和鲁棒性方面的优势。这些结果突出了MPCCL在各种图聚类任务中的应用潜力，从社会网络分析到生物信息学和基于知识图的数据挖掘。本研究的源代码可从https://github.com/YF-W/MPCCL获得

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Attributed graph clustering with multi-scale weight-based pairwise coarsening and contrastive learning

This study introduces the Multi-Scale Weight-Based Pairwise Coarsening and Contrastive Learning (MPCCL) model, a novel approach for attributed graph clustering that effectively bridges critical gaps in existing methods, including long-range dependency, feature collapse, and information loss. Traditional methods often struggle to capture high-order graph features due to their reliance on low-order attribute information, while contrastive learning techniques face limitations in feature diversity by overemphasizing local neighborhood structures. Similarly, conventional graph coarsening methods, though reducing graph scale, frequently lose fine-grained structural details. MPCCL addresses these challenges through an innovative multi-scale coarsening strategy, which progressively condenses the graph while prioritizing the merging of key edges based on global node similarity to preserve essential structural information. It further introduces a one-to-many contrastive learning paradigm, integrating node embeddings with augmented graph views and cluster centroids to enhance feature diversity, while mitigating feature masking issues caused by the accumulation of high-frequency node weights during multi-scale coarsening. By incorporating a graph reconstruction loss and KL divergence into its self-supervised learning framework, MPCCL ensures cross-scale consistency of node representations. Experimental evaluations reveal that MPCCL achieves a significant improvement in clustering performance, including a remarkable 15.24 % increase in NMI on the ACM dataset and notable robust gains on smaller-scale datasets such as Citeseer, Cora and DBLP. In the large-scale Reuters dataset, it significantly improved by 17.84 %, further validating its advantage in enhancing clustering performance and robustness. These results highlight MPCCL’s potential for application in diverse graph clustering tasks, ranging from social network analysis to bioinformatics and knowledge graph-based data mining. The source code for this study is available at https://github.com/YF-W/MPCCL

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Neurocomputing 工程技术-计算机：人工智能

CiteScore

13.10

自引率

10.00%

发文量

1382

审稿时长

70 days

期刊介绍： Neurocomputing publishes articles describing recent fundamental contributions in the field of neurocomputing. Neurocomputing theory, practice and applications are the essential topics being covered.