Clustering Diffusion Model with Frequency-Signal Modulation for Variational Graph Autoencoders.

IF 18.6 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

IEEE Transactions on Pattern Analysis and Machine Intelligence Pub Date : 2025-09-25 DOI:10.1109/tpami.2025.3614385

Junwei Cheng,Ke Liang,Pengxing Feng,Weixiong Liu,Yong Tang,Chaobo He

{"title":"Clustering Diffusion Model with Frequency-Signal Modulation for Variational Graph Autoencoders.","authors":"Junwei Cheng,Ke Liang,Pengxing Feng,Weixiong Liu,Yong Tang,Chaobo He","doi":"10.1109/tpami.2025.3614385","DOIUrl":null,"url":null,"abstract":"Variational autoencoders (VAEs) have been widely used for node clustering, with existing methods mainly focusing on enhancing the expressiveness of their latent space. Recently, the integration of diffusion models with VAEs has provided new opportunities to achieve this objective. However, the mechanism by which the diffusion model improves performance remains unclear. To bridge this gap, we conduct an empirical analysis from the perspective of graph spectral theory, revealing that the signal modulation induced by diffusion models closely aligns with the low-frequency spectral characteristics of VAEs, which in turn explains their effectiveness. Nevertheless, further experiments highlight that diffusion models exhibit limitations in modulating high-frequency signals, which diverge from the spectral characteristics of VAEs. Moreover, existing diffusion methods fail to enable the latent space to adequately capture and reflect cluster-specific characteristics. To address these challenges, we propose a novel plug-and-play method, FVD, to improve the performance of VAE-based methods in node clustering tasks. Specifically, we incorporate the graph wavelet transform as a secondary signal modulator, enabling independent adjustments of specific frequency bands to better align with the spectral characteristics of VAEs. Additionally, we introduce the Student's t-distribution as a conditional constraint in the reverse process of FVD, deriving a more compact variational lower bound. This enhancement preserves fine-grained node information while focusing on clustering details, effectively mitigating the cluster collapse phenomenon. Comprehensive experimental results demonstrate that integrating FVD with existing methods achieves competitive performance improvements in most cases.","PeriodicalId":13426,"journal":{"name":"IEEE Transactions on Pattern Analysis and Machine Intelligence","volume":"1 1","pages":""},"PeriodicalIF":18.6000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Pattern Analysis and Machine Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1109/tpami.2025.3614385","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Variational autoencoders (VAEs) have been widely used for node clustering, with existing methods mainly focusing on enhancing the expressiveness of their latent space. Recently, the integration of diffusion models with VAEs has provided new opportunities to achieve this objective. However, the mechanism by which the diffusion model improves performance remains unclear. To bridge this gap, we conduct an empirical analysis from the perspective of graph spectral theory, revealing that the signal modulation induced by diffusion models closely aligns with the low-frequency spectral characteristics of VAEs, which in turn explains their effectiveness. Nevertheless, further experiments highlight that diffusion models exhibit limitations in modulating high-frequency signals, which diverge from the spectral characteristics of VAEs. Moreover, existing diffusion methods fail to enable the latent space to adequately capture and reflect cluster-specific characteristics. To address these challenges, we propose a novel plug-and-play method, FVD, to improve the performance of VAE-based methods in node clustering tasks. Specifically, we incorporate the graph wavelet transform as a secondary signal modulator, enabling independent adjustments of specific frequency bands to better align with the spectral characteristics of VAEs. Additionally, we introduce the Student's t-distribution as a conditional constraint in the reverse process of FVD, deriving a more compact variational lower bound. This enhancement preserves fine-grained node information while focusing on clustering details, effectively mitigating the cluster collapse phenomenon. Comprehensive experimental results demonstrate that integrating FVD with existing methods achieves competitive performance improvements in most cases.

查看原文本刊更多论文

变分图自编码器的频率信号调制聚类扩散模型。

变分自编码器（VAEs）被广泛用于节点聚类，现有方法主要侧重于增强其潜在空间的表达性。最近，扩散模型与VAEs的集成为实现这一目标提供了新的机会。然而，扩散模型提高性能的机制仍不清楚。为了弥补这一差距，我们从图谱理论的角度进行了实证分析，发现扩散模型引起的信号调制与VAEs的低频频谱特征密切相关，这反过来解释了VAEs的有效性。然而，进一步的实验表明，扩散模型在调制高频信号方面存在局限性，这与VAEs的频谱特征不同。此外，现有的扩散方法无法使潜在空间充分捕捉和反映集群特有的特征。为了解决这些挑战，我们提出了一种新的即插即用方法FVD，以提高基于vae的方法在节点聚类任务中的性能。具体而言，我们将图小波变换作为二级信号调制器，使特定频段的独立调整能够更好地与VAEs的频谱特性相一致。此外，我们引入学生的t分布作为FVD逆向过程的条件约束，推导出一个更紧凑的变分下界。这种增强在关注集群细节的同时保留了细粒度的节点信息，有效地减轻了集群崩溃现象。综合实验结果表明，将FVD与现有方法相结合，在大多数情况下都能获得具有竞争力的性能改进。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

IEEE Transactions on Pattern Analysis and Machine Intelligence 工程技术-工程：电子与电气

CiteScore

28.40

自引率

3.00%

发文量

885

审稿时长

8.5 months

期刊介绍： The IEEE Transactions on Pattern Analysis and Machine Intelligence publishes articles on all traditional areas of computer vision and image understanding, all traditional areas of pattern analysis and recognition, and selected areas of machine intelligence, with a particular emphasis on machine learning for pattern analysis. Areas such as techniques for visual search, document and handwriting analysis, medical image analysis, video and image sequence analysis, content-based retrieval of image and video, face and gesture recognition and relevant specialized hardware and/or software architectures are also covered.