Self-supervised learning of invariant causal representation in heterogeneous information network

IF 14.7 1区计算机科学 Q1 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Information Fusion Pub Date : 2025-05-09 DOI:10.1016/j.inffus.2025.103246

Pei Zhang , Lihua Zhou , Yong Li , Hongmei Chen , Lizhen Wang

{"title":"Self-supervised learning of invariant causal representation in heterogeneous information network","authors":"Pei Zhang , Lihua Zhou , Yong Li , Hongmei Chen , Lizhen Wang","doi":"10.1016/j.inffus.2025.103246","DOIUrl":null,"url":null,"abstract":"<div><div>Invariant learning on graphs is essential for uncovering causal relationships in complex phenomena. However, most research has focused on homogeneous information networks with single node and edge types, ignoring the rich heterogeneity of real-world systems. Additionally, many invariant learning methods rely on labeled data and the design of complex graph augmentation or contrastive sampling algorithms, requiring domain-specific expertise or substantial human resources, making them difficult to implement in practical applications. To overcome these limitations, we propose a <strong>G</strong>enerative-<strong>C</strong>ontrastive <strong>C</strong>ollaborative <strong>S</strong>elf-Supervised Learning (GCCS) framework. This framework combines the ability of generative learning to mine supervisory signals from the data itself with the capacity of contrastive learning to learn invariant representations, enabling self-supervised learning of invariant causal representations from heterogeneous information networks (HINs). Specifically, generative self-supervised learning (SSL) constructs meta-path aware adjacency matrices and performs a mask-reconstruct operation, while contrastive SSL refines the learned representations by enforcing similarity and consensus constraints across different views. This joint optimization captures invariant causal features, enhancing the model’s robustness. Extensive experiments on three real-world HINs datasets demonstrate that GCCS outperforms state-of-the-art baselines, particularly in noisy and complex environments, showcasing its superior performance and robustness for self-supervised learning in heterogeneous graph structures.</div></div>","PeriodicalId":50367,"journal":{"name":"Information Fusion","volume":"123 ","pages":"Article 103246"},"PeriodicalIF":14.7000,"publicationDate":"2025-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Information Fusion","FirstCategoryId":"94","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S1566253525003197","RegionNum":1,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Invariant learning on graphs is essential for uncovering causal relationships in complex phenomena. However, most research has focused on homogeneous information networks with single node and edge types, ignoring the rich heterogeneity of real-world systems. Additionally, many invariant learning methods rely on labeled data and the design of complex graph augmentation or contrastive sampling algorithms, requiring domain-specific expertise or substantial human resources, making them difficult to implement in practical applications. To overcome these limitations, we propose a Generative-Contrastive Collaborative Self-Supervised Learning (GCCS) framework. This framework combines the ability of generative learning to mine supervisory signals from the data itself with the capacity of contrastive learning to learn invariant representations, enabling self-supervised learning of invariant causal representations from heterogeneous information networks (HINs). Specifically, generative self-supervised learning (SSL) constructs meta-path aware adjacency matrices and performs a mask-reconstruct operation, while contrastive SSL refines the learned representations by enforcing similarity and consensus constraints across different views. This joint optimization captures invariant causal features, enhancing the model’s robustness. Extensive experiments on three real-world HINs datasets demonstrate that GCCS outperforms state-of-the-art baselines, particularly in noisy and complex environments, showcasing its superior performance and robustness for self-supervised learning in heterogeneous graph structures.

查看原文本刊更多论文

异构信息网络中不变因果表示的自监督学习

图上的不变量学习对于揭示复杂现象中的因果关系至关重要。然而，大多数研究都集中在具有单一节点和边缘类型的同质信息网络上，忽视了现实世界系统的丰富异质性。此外，许多不变学习方法依赖于标记数据和复杂图增强或对比采样算法的设计，需要特定领域的专业知识或大量的人力资源，这使得它们难以在实际应用中实现。为了克服这些限制，我们提出了一个生成-对比协作自监督学习（GCCS）框架。该框架结合了从数据本身挖掘监督信号的生成学习能力和学习不变表示的对比学习能力，从而实现了从异构信息网络（HINs）中对不变因果表示的自监督学习。具体来说，生成式自监督学习（SSL）构建元路径感知邻接矩阵并执行掩模重构操作，而对比式SSL通过在不同视图之间强制相似性和一致性约束来改进学习到的表示。这种联合优化捕获了不变的因果特征，增强了模型的鲁棒性。在三个真实世界的HINs数据集上进行的大量实验表明，GCCS优于最先进的基线，特别是在嘈杂和复杂的环境中，展示了其在异构图结构中自监督学习的优越性能和鲁棒性。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Information Fusion 工程技术-计算机：理论方法

CiteScore

33.20

自引率

4.30%

发文量

161

审稿时长

7.9 months

期刊介绍： Information Fusion serves as a central platform for showcasing advancements in multi-sensor, multi-source, multi-process information fusion, fostering collaboration among diverse disciplines driving its progress. It is the leading outlet for sharing research and development in this field, focusing on architectures, algorithms, and applications. Papers dealing with fundamental theoretical analyses as well as those demonstrating their application to real-world problems will be welcome.