Pei Zhang , Lihua Zhou , Yong Li , Hongmei Chen , Lizhen Wang
{"title":"Self-supervised learning of invariant causal representation in heterogeneous information network","authors":"Pei Zhang , Lihua Zhou , Yong Li , Hongmei Chen , Lizhen Wang","doi":"10.1016/j.inffus.2025.103246","DOIUrl":null,"url":null,"abstract":"<div><div>Invariant learning on graphs is essential for uncovering causal relationships in complex phenomena. However, most research has focused on homogeneous information networks with single node and edge types, ignoring the rich heterogeneity of real-world systems. Additionally, many invariant learning methods rely on labeled data and the design of complex graph augmentation or contrastive sampling algorithms, requiring domain-specific expertise or substantial human resources, making them difficult to implement in practical applications. To overcome these limitations, we propose a <strong>G</strong>enerative-<strong>C</strong>ontrastive <strong>C</strong>ollaborative <strong>S</strong>elf-Supervised Learning (GCCS) framework. This framework combines the ability of generative learning to mine supervisory signals from the data itself with the capacity of contrastive learning to learn invariant representations, enabling self-supervised learning of invariant causal representations from heterogeneous information networks (HINs). Specifically, generative self-supervised learning (SSL) constructs meta-path aware adjacency matrices and performs a mask-reconstruct operation, while contrastive SSL refines the learned representations by enforcing similarity and consensus constraints across different views. This joint optimization captures invariant causal features, enhancing the model’s robustness. Extensive experiments on three real-world HINs datasets demonstrate that GCCS outperforms state-of-the-art baselines, particularly in noisy and complex environments, showcasing its superior performance and robustness for self-supervised learning in heterogeneous graph structures.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103246"},"PeriodicalIF":14.7000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003197","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}
引用次数: 0
Abstract
Invariant learning on graphs is essential for uncovering causal relationships in complex phenomena. However, most research has focused on homogeneous information networks with single node and edge types, ignoring the rich heterogeneity of real-world systems. Additionally, many invariant learning methods rely on labeled data and the design of complex graph augmentation or contrastive sampling algorithms, requiring domain-specific expertise or substantial human resources, making them difficult to implement in practical applications. To overcome these limitations, we propose a Generative-Contrastive Collaborative Self-Supervised Learning (GCCS) framework. This framework combines the ability of generative learning to mine supervisory signals from the data itself with the capacity of contrastive learning to learn invariant representations, enabling self-supervised learning of invariant causal representations from heterogeneous information networks (HINs). Specifically, generative self-supervised learning (SSL) constructs meta-path aware adjacency matrices and performs a mask-reconstruct operation, while contrastive SSL refines the learned representations by enforcing similarity and consensus constraints across different views. This joint optimization captures invariant causal features, enhancing the model’s robustness. Extensive experiments on three real-world HINs datasets demonstrate that GCCS outperforms state-of-the-art baselines, particularly in noisy and complex environments, showcasing its superior performance and robustness for self-supervised learning in heterogeneous graph structures.
期刊介绍:
Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.